MXNet workshop Dec 2020 presentation on the array API standardization effort ongoing in the Consortium for Python Data API Standards - see data-apis.org
NumPy Roadmap presentation at NumFOCUS ForumRalf Gommers
This presentation is an attempt to summarize the NumPy roadmap and both technical and non-technical ideas for the next 1-2 years to users that heavily rely on NumPy, as well as potential funders.
Python array API standardization - current state and benefitsRalf Gommers
Talk given at GTC Fall 2021.
The Python array API standard, which was first announced towards the end of 2020, is maturing and becoming available to Python end users. NumPy now has a reference implementation, PyTorch support is close to complete, and other libraries have started to implement support. In this talk we will discuss the current state of implementations, and look at a concrete use case of moving a scientific analysis workflow to using the API standard - thereby gaining access to GPU acceleration.
This document discusses recent updates to NumPy and SciPy. Key updates include a complete overhaul of NumPy's random number generators and Fourier transform implementations. NumPy's __array_function__ protocol is now enabled by default, allowing other libraries to reuse the NumPy API. The NumPy array protocols were developed to separate the NumPy API from its execution engine. This avoids ecosystem fragmentation and allows the NumPy API to work with GPUs and distributed arrays via libraries like Dask. SciPy's FFT functions were reimplemented for increased speed and accuracy, and a new scipy.fft submodule was added, representing the first new SciPy submodule in a decade. Additional new global optimizers were also added to SciPy.
The evolution of array computing in PythonRalf Gommers
My PyData Amsterdam 2019 presentation.
Have you ever wanted to run your NumPy based code on multiple cores, or on a distributed system, or on your GPU? Wouldn't it be nice to do this without changing your code? We will discuss how NumPy's array protocols work, and provide a practical guide on how to start using them. We will also discuss how array libraries in Python may evolve over the next few years.
( Python Training: https://www.edureka.co/python )
This Edureka Python Numpy tutorial (Python Tutorial Blog: https://goo.gl/wd28Zr) explains what exactly is Numpy and how it is better than Lists. It also explains various Numpy operations with examples.
Check out our Python Training Playlist: https://goo.gl/Na1p9G
This tutorial helps you to learn the following topics:
1. What is Numpy?
2. Numpy v/s Lists
3. Numpy Operations
4. Numpy Special Functions
This document provides a summary of the history and capabilities of SciPy. It discusses how SciPy was founded in 2001 by Travis Oliphant with packages for optimization, sparse matrices, interpolation, integration, special functions, and more. It highlights key contributors to the SciPy community and ecosystem. It describes why Python is well-suited for technical computing due to its syntax, built-in array support, and ability to support multiple programming styles. It outlines NumPy's array-oriented approach and benefits for technical problems. Finally, it discusses new projects like Blaze and Numba that aim to further improve the SciPy software stack.
Python is the choice llanguage for data analysis,
The aim of this slide is to provide a comprehensive learning path to people new to python for data analysis. This path provides a comprehensive overview of the steps you need to learn to use Python for data analysis.
The document discusses optimizing Python code for high performance. It begins with examples showing how to optimize a function by avoiding attribute lookups, using list comprehensions instead of loops, and leveraging built-in functions like map. Next, it covers concepts like vectorization, avoiding interpreter overhead through just-in-time compilers and static typing with Cython. Finally, it discusses profiling code to find bottlenecks and introduces tools like Numba, PyPy and Numexpr that can speed up Python code.
This document discusses plotting data with Python and Pylab. It begins by describing a sample data table and the problem of reading and plotting the data. It then reviews options for plotting in Python like Pylab, Enthought, RPy, and Sage. The remainder of the document demonstrates how to use Pylab to read CSV data, and create bar charts, pie charts, line plots, and histograms of the sample data.
This document discusses data visualization tools in Python. It introduces Matplotlib as the first and still standard Python visualization tool. It also covers Seaborn which builds on Matplotlib, Bokeh for interactive visualizations, HoloViews as a higher-level wrapper for Bokeh, and Datashader for big data visualization. Additional tools discussed include Folium for maps, and yt for volumetric data visualization. The document concludes that Python is well-suited for data science and visualization with many options available.
The document introduces Scipy, Numpy and related tools for scientific computing in Python. It provides links to documentation and tutorials for Scipy and Numpy for numerical operations, Matplotlib for data visualization, and IPython for an interactive coding environment. It also includes short examples and explanations of Numpy arrays, plotting, data analysis workflows, and accessing help documentation.
Get Your Hands Dirty with Intel® Distribution for Python*Intel® Software
This session reviews using Intel® Distribution for Python* and Intel® Data Analytics Acceleration Library (Intel® DAAL) to accelerate data analytics and machine learning algorithms. Build fast machine learning applications in Python with tools from Intel. Get an introduction to the Python API in Intel DAAL. Intel Distribution for Python is used as the foundation of all the session tasks. Implement some common algorithms, such as K-means, linear regression, multiclass support vector machine (SVM), and neural networks using Intel DAAL. This session uses real-world data collections available online, such as those from University of California, Irvine, Machine Learning Repository.
The document discusses setting up a 4-node MPI Raspberry Pi cluster and Hadoop cluster. It describes the hardware and software needed for the MPI cluster, including 4 Raspberry Pi 3 boards, Ethernet cables, micro SD cards, and MPI software. It also provides an overview of Hadoop, a framework for distributed storage and processing of big data, noting its origins from Google papers and use by companies like Amazon, Facebook, and Netflix.
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...Intel® Software
This document discusses profiling Python code to optimize performance. It provides an overview of different types of profilers, including event-based, instrumentation-based, and statistical profilers. It then demonstrates how to use Intel VTune Amplifier to profile Python code with low overhead. Key steps include creating a project, running a basic hotspot analysis, and interpreting the results to identify optimization opportunities. Mixed Python/C++ profiling is also supported to optimize Python code calling native extensions.
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...MLconf
Practical Probabilistic Programming with Figaro: Probabilistic reasoning enables you to predict the future, infer the past, and learn from experience. Probabilistic programming enables users to build and reason with a wide variety of probabilistic models without machine learning expertise. In this talk, I will present Figaro, a mature probabilistic programming system with many applications. I will describe the main design principles of the language and show example applications. I will also discuss our current efforts to fully automate and optimize the inference process.
Numba is an open source just-in-time compiler for Python that uses the LLVM compiler infrastructure to generate optimized machine code from Python syntax. It allows Python code to be accelerated by running on multicore CPUs or GPUs. Numba can compile NumPy array expressions and ufuncs, parallel for loops, and user-defined Python functions to run at native speeds without rewriting in a different language. It provides an easy to use interface and can achieve large speedups of over 1000x compared to Python.
An adaptive algorithm for detection of duplicate recordsLikan Patra
The document proposes an adaptive algorithm for detecting duplicate records in a database. The algorithm hashes each record to a unique prime number. It then divides the product of prior prime numbers by the new record's prime number. If it divides evenly, the record is duplicate. Otherwise, it is distinct and the product is updated with the new prime number, making the algorithm adaptive. The algorithm aims to reduce duplicate detection costs while maintaining scalability and caching prior records.
An introductory talk on scientific computing in Python. Statistics, probability and linear algebra, are important aspects of computing/computer modeling and the same is covered here.
This document provides an overview of NumPy, an open source Python library for numerical computing and data analysis. It introduces NumPy and its key features like N-dimensional arrays for fast mathematical calculations. It then covers various NumPy concepts and functions including initialization and creation of NumPy arrays, accessing and modifying arrays, concatenation, splitting, reshaping, adding dimensions, common utility functions, and broadcasting. The document aims to simplify learning of these essential NumPy concepts.
The document provides an overview of the Matplotlib library architecture and its key components. It discusses the three layers of Matplotlib - the backend layer, artist layer, and scripting layer. The backend layer handles rendering plots into different formats. The artist layer contains classes that generate visual elements. The scripting layer provides interfaces for users to access the other layers and generate figures and plots. It also outlines some common plot types and customization techniques in Matplotlib.
GNU Octave is an open-source program that is very similar to MATLAB. It can be used for numerical computations and has many of the same features as MATLAB, including matrices as a fundamental data type, support for complex numbers, built-in math functions and libraries, and support for user-defined functions. While most MATLAB programs will run in Octave, there are some minor syntactic differences between the programs and some functionality, like integration and differentiation, is implemented differently.
Numba: Array-oriented Python Compiler for NumPyTravis Oliphant
Numba is a Python compiler that translates Python code into fast machine code using the LLVM compiler infrastructure. It allows Python code that works with NumPy arrays to be just-in-time compiled to native machine instructions, achieving performance comparable to C, C++ and Fortran for numeric work. Numba provides decorators like @jit that can compile functions for improved performance on NumPy array operations. It aims to make Python a compiled and optimized language for scientific computing by leveraging type information from NumPy to generate fast machine code.
This document provides an overview of C++ programming concepts. It introduces C++, focusing on programming concepts and design techniques rather than technical language details. It discusses procedural and object-oriented programming concepts, and how C++ supports both as well as generic programming. The key characteristics of object-oriented programming languages are explained as encapsulation, inheritance, and polymorphism.
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...Intel® Software
In part two of this presentation, continue to learn about the latest developments and tools for high-performance Python* for scikit-learn*, NumPy, SciPy, Pandas, mpi4py, and Numba*. Apply low-overhead profiling tools to analyze mixed C, C++, and Python applications to detect performance bottlenecks in the code and to pinpoint hotspots as the target for performance tuning. Get the best performance from your Python application with the best-known methods, tools, and libraries.
This document provides an overview of NumPy, a fundamental Python library for numerical computing and data science. It discusses how NumPy enables fast and expressive array computing in Python, allowing operations on whole arrays to be performed efficiently at low-level speeds approaching that of languages like C. NumPy arrays store data in a single block of memory and use broadcasting rules to perform arithmetic on arrays with incompatible shapes. NumPy also supports multidimensional indexing and slicing that can return views into arrays without copying data.
AI & Topology concluding remarks - "The open-source landscape for topology in...Umberto Lupo
A short concluding speech for the AI & Topology session at the 2020 Applied Machine Learning Days (28 January 2020).
We remark on the strength of the case for using topological methods in various domains of machine learning. We then comment on our views on integrating topology with the practice of machine learning at a fundamental level. We give an (inexhaustive) overview of the open-source landscape for topological machine learning and data analysis, including our contribution, the giotto-tda Python package. Finally, we mention some promising future directions in the field.
Standardizing arrays -- Microsoft PresentationTravis Oliphant
This document discusses standardizing N-dimensional arrays (tensors) in Python. It proposes creating a "uarray" interface that downstream libraries could use to work with different array implementations in a common way. This would include defining core concepts like shape, data type, and math operations for arrays. It also discusses collaborating with mathematicians on formalizing array operations and learning from NumPy's generalized ufunc approach. The goal is to enhance Python's array ecosystem and allow libraries to work across hardware backends through a shared interface rather than depending on a single implementation.
The road ahead for scientific computing with PythonRalf Gommers
1) The PyData ecosystem, including NumPy, SciPy, and scikit-learn, faces technical challenges related to fragmentation of array libraries, lack of parallelism, packaging constraints, and performance issues for non-vectorized algorithms.
2) There are also social challenges around sustainability of key projects due to limited funding and maintainers, tensions with proprietary involvement from large tech companies, and academia's role in supporting open-source scientific software.
3) NumPy is working to address these issues through efforts like the Array API standardization, improved extensibility and performance, and growing autonomous teams and diversity within the community.
This document discusses plotting data with Python and Pylab. It begins by describing a sample data table and the problem of reading and plotting the data. It then reviews options for plotting in Python like Pylab, Enthought, RPy, and Sage. The remainder of the document demonstrates how to use Pylab to read CSV data, and create bar charts, pie charts, line plots, and histograms of the sample data.
This document discusses data visualization tools in Python. It introduces Matplotlib as the first and still standard Python visualization tool. It also covers Seaborn which builds on Matplotlib, Bokeh for interactive visualizations, HoloViews as a higher-level wrapper for Bokeh, and Datashader for big data visualization. Additional tools discussed include Folium for maps, and yt for volumetric data visualization. The document concludes that Python is well-suited for data science and visualization with many options available.
The document introduces Scipy, Numpy and related tools for scientific computing in Python. It provides links to documentation and tutorials for Scipy and Numpy for numerical operations, Matplotlib for data visualization, and IPython for an interactive coding environment. It also includes short examples and explanations of Numpy arrays, plotting, data analysis workflows, and accessing help documentation.
Get Your Hands Dirty with Intel® Distribution for Python*Intel® Software
This session reviews using Intel® Distribution for Python* and Intel® Data Analytics Acceleration Library (Intel® DAAL) to accelerate data analytics and machine learning algorithms. Build fast machine learning applications in Python with tools from Intel. Get an introduction to the Python API in Intel DAAL. Intel Distribution for Python is used as the foundation of all the session tasks. Implement some common algorithms, such as K-means, linear regression, multiclass support vector machine (SVM), and neural networks using Intel DAAL. This session uses real-world data collections available online, such as those from University of California, Irvine, Machine Learning Repository.
The document discusses setting up a 4-node MPI Raspberry Pi cluster and Hadoop cluster. It describes the hardware and software needed for the MPI cluster, including 4 Raspberry Pi 3 boards, Ethernet cables, micro SD cards, and MPI software. It also provides an overview of Hadoop, a framework for distributed storage and processing of big data, noting its origins from Google papers and use by companies like Amazon, Facebook, and Netflix.
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...Intel® Software
This document discusses profiling Python code to optimize performance. It provides an overview of different types of profilers, including event-based, instrumentation-based, and statistical profilers. It then demonstrates how to use Intel VTune Amplifier to profile Python code with low overhead. Key steps include creating a project, running a basic hotspot analysis, and interpreting the results to identify optimization opportunities. Mixed Python/C++ profiling is also supported to optimize Python code calling native extensions.
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...MLconf
Practical Probabilistic Programming with Figaro: Probabilistic reasoning enables you to predict the future, infer the past, and learn from experience. Probabilistic programming enables users to build and reason with a wide variety of probabilistic models without machine learning expertise. In this talk, I will present Figaro, a mature probabilistic programming system with many applications. I will describe the main design principles of the language and show example applications. I will also discuss our current efforts to fully automate and optimize the inference process.
Numba is an open source just-in-time compiler for Python that uses the LLVM compiler infrastructure to generate optimized machine code from Python syntax. It allows Python code to be accelerated by running on multicore CPUs or GPUs. Numba can compile NumPy array expressions and ufuncs, parallel for loops, and user-defined Python functions to run at native speeds without rewriting in a different language. It provides an easy to use interface and can achieve large speedups of over 1000x compared to Python.
An adaptive algorithm for detection of duplicate recordsLikan Patra
The document proposes an adaptive algorithm for detecting duplicate records in a database. The algorithm hashes each record to a unique prime number. It then divides the product of prior prime numbers by the new record's prime number. If it divides evenly, the record is duplicate. Otherwise, it is distinct and the product is updated with the new prime number, making the algorithm adaptive. The algorithm aims to reduce duplicate detection costs while maintaining scalability and caching prior records.
An introductory talk on scientific computing in Python. Statistics, probability and linear algebra, are important aspects of computing/computer modeling and the same is covered here.
This document provides an overview of NumPy, an open source Python library for numerical computing and data analysis. It introduces NumPy and its key features like N-dimensional arrays for fast mathematical calculations. It then covers various NumPy concepts and functions including initialization and creation of NumPy arrays, accessing and modifying arrays, concatenation, splitting, reshaping, adding dimensions, common utility functions, and broadcasting. The document aims to simplify learning of these essential NumPy concepts.
The document provides an overview of the Matplotlib library architecture and its key components. It discusses the three layers of Matplotlib - the backend layer, artist layer, and scripting layer. The backend layer handles rendering plots into different formats. The artist layer contains classes that generate visual elements. The scripting layer provides interfaces for users to access the other layers and generate figures and plots. It also outlines some common plot types and customization techniques in Matplotlib.
GNU Octave is an open-source program that is very similar to MATLAB. It can be used for numerical computations and has many of the same features as MATLAB, including matrices as a fundamental data type, support for complex numbers, built-in math functions and libraries, and support for user-defined functions. While most MATLAB programs will run in Octave, there are some minor syntactic differences between the programs and some functionality, like integration and differentiation, is implemented differently.
Numba: Array-oriented Python Compiler for NumPyTravis Oliphant
Numba is a Python compiler that translates Python code into fast machine code using the LLVM compiler infrastructure. It allows Python code that works with NumPy arrays to be just-in-time compiled to native machine instructions, achieving performance comparable to C, C++ and Fortran for numeric work. Numba provides decorators like @jit that can compile functions for improved performance on NumPy array operations. It aims to make Python a compiled and optimized language for scientific computing by leveraging type information from NumPy to generate fast machine code.
This document provides an overview of C++ programming concepts. It introduces C++, focusing on programming concepts and design techniques rather than technical language details. It discusses procedural and object-oriented programming concepts, and how C++ supports both as well as generic programming. The key characteristics of object-oriented programming languages are explained as encapsulation, inheritance, and polymorphism.
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...Intel® Software
In part two of this presentation, continue to learn about the latest developments and tools for high-performance Python* for scikit-learn*, NumPy, SciPy, Pandas, mpi4py, and Numba*. Apply low-overhead profiling tools to analyze mixed C, C++, and Python applications to detect performance bottlenecks in the code and to pinpoint hotspots as the target for performance tuning. Get the best performance from your Python application with the best-known methods, tools, and libraries.
This document provides an overview of NumPy, a fundamental Python library for numerical computing and data science. It discusses how NumPy enables fast and expressive array computing in Python, allowing operations on whole arrays to be performed efficiently at low-level speeds approaching that of languages like C. NumPy arrays store data in a single block of memory and use broadcasting rules to perform arithmetic on arrays with incompatible shapes. NumPy also supports multidimensional indexing and slicing that can return views into arrays without copying data.
AI & Topology concluding remarks - "The open-source landscape for topology in...Umberto Lupo
A short concluding speech for the AI & Topology session at the 2020 Applied Machine Learning Days (28 January 2020).
We remark on the strength of the case for using topological methods in various domains of machine learning. We then comment on our views on integrating topology with the practice of machine learning at a fundamental level. We give an (inexhaustive) overview of the open-source landscape for topological machine learning and data analysis, including our contribution, the giotto-tda Python package. Finally, we mention some promising future directions in the field.
Standardizing arrays -- Microsoft PresentationTravis Oliphant
This document discusses standardizing N-dimensional arrays (tensors) in Python. It proposes creating a "uarray" interface that downstream libraries could use to work with different array implementations in a common way. This would include defining core concepts like shape, data type, and math operations for arrays. It also discusses collaborating with mathematicians on formalizing array operations and learning from NumPy's generalized ufunc approach. The goal is to enhance Python's array ecosystem and allow libraries to work across hardware backends through a shared interface rather than depending on a single implementation.
The road ahead for scientific computing with PythonRalf Gommers
1) The PyData ecosystem, including NumPy, SciPy, and scikit-learn, faces technical challenges related to fragmentation of array libraries, lack of parallelism, packaging constraints, and performance issues for non-vectorized algorithms.
2) There are also social challenges around sustainability of key projects due to limited funding and maintainers, tensions with proprietary involvement from large tech companies, and academia's role in supporting open-source scientific software.
3) NumPy is working to address these issues through efforts like the Array API standardization, improved extensibility and performance, and growing autonomous teams and diversity within the community.
NumPy is a Python library that provides multidimensional arrays and matrices for numerical computing along with high-level mathematical functions to operate on these arrays. NumPy arrays can represent vectors, matrices, images, and tensors. NumPy allows fast numerical computing by taking advantage of optimized low-level C/C++ implementations and parallel computing on multicore processors. Common operations like element-wise array arithmetic and universal functions are much faster with NumPy than with native Python.
NumPy and Scipy provide MATLAB-like functionality for numerical computing in Python. NumPy features include typed multidimensional arrays for fast numerical computations like matrix math. NumPy is much faster than Python for tasks like matrix multiplication. NumPy arrays can represent vectors, matrices, images, tensors, and more. NumPy provides functions for creating, manipulating, and performing mathematical operations on arrays. Broadcasting rules allow arrays of different dimensions to perform element-wise operations.
Crafting Your Own Numpy: Do More in C++ and Make It Python @ PyCon JP 2024Anchi Liu
The slides presenting for the talk "Crafting Your Own Numpy: Do More in C++ and Make It Python" at PyCon JP 2024, https://2024.pycon.jp/en/talk/XXCCQR
NumPy is a Python package that provides multidimensional array and matrix objects as well as tools to work with these objects. It was created to handle large, multi-dimensional arrays and matrices efficiently. NumPy arrays enable fast operations on large datasets and facilitate scientific computing using Python. NumPy also contains functions for Fourier transforms, random number generation and linear algebra operations.
This document summarizes a presentation given by Diane Mueller from ActiveState and Dr. Mike Müller from Python Academy. It compares MATLAB and Python capabilities for scientific computing. Python has many libraries like NumPy, SciPy, IPython and matplotlib that provide similar functionality to MATLAB. Together these are often called "Pylab". The presentation provides an overview of Python, NumPy arrays, visualization with matplotlib, and integrating Python with other languages.
A lecture given for Stats 285 at Stanford on October 30, 2017. I discuss how OSS technology developed at Anaconda, Inc. has helped to scale Python to GPUs and Clusters.
Keynote talk at PyCon Estonia 2019 where I discuss how to extend CPython and how that has led to a robust ecosystem around Python. I then discuss the need to define and build a Python extension language I later propose as EPython on OpenTeams: https://openteams.com/initiatives/2
Travis Oliphant "Python for Speed, Scale, and Science"Fwdays
Python is sometimes discounted as slow because of its dynamic typing and interpreted nature and not suitable for scale because of the GIL. But, in this talk, I will show how with the help of talented open-source contributors around the world, we have been able to build systems in Python that are fast and scalable to many machines and how this has helped Python take over Science.
This document provides an overview of NumPy arrays, including how to create and manipulate vectors (1D arrays) and matrices (2D arrays). It discusses NumPy data types and shapes, and how to index, slice, and perform common operations on arrays like summation, multiplication, and dot products. It also compares the performance of vectorized NumPy operations versus equivalent Python for loops.
This document is useful when use with Video session I have recorded today with execution, This is document no. 2 of course "Introduction of Data Science using Python". Which is a prerequisite of Artificial Intelligence course at Ethans Tech.
Disclaimer: Some of the Images and content have been taken from Multiple online sources and this presentation is intended only for Knowledge Sharing
SciPy and NumPy are Python packages that provide scientific computing capabilities. NumPy provides multidimensional array objects and fast linear algebra functions. SciPy builds on NumPy and adds modules for optimization, integration, signal and image processing, and more. Together, NumPy and SciPy give Python powerful data analysis and visualization capabilities. The community contributes to both projects to expand their functionality. Memory mapped arrays in NumPy allow working with large datasets that exceed system memory.
On the necessity and inapplicability of pythonYung-Yu Chen
Python is a popular scripting language adopted by numerical software vendors to help users solve challenging numerical problems. It provides easy-to-use interface and offers decent speed through array operations, but it is not suitable for engineering the low-level constructs. To make good numerical software, developers need to be familiar with C++ and computer architecture. The gap of understandings between the high-level applications and low-level implementation motivated me to organize a course to train computer scientists what it takes to build numerical software that the users (application experts) want. This talk will portray a bird view of the advantages and disadvantages of Python and where and how C++ should be used in the context of numerical software. The information may be used to map out a plan to acquire the necessary skill sets for making the software.
Recording https://www.youtube.com/watch?v=OwA-Xt_Ke3Y
On the Necessity and Inapplicability of PythonTakeshi Akutsu
This document discusses the use of Python for numerical software development. It begins by introducing the author and their background in computational mechanics. It then discusses PyHUG, the Python user group in Taiwan, and PyCon Taiwan 2020.
The document notes that while Python is slow for number crunching, NumPy can provide reasonably fast performance. It explains that a hybrid architecture is commonly used, with the core computing kernel written in C++ for speed and Python used for the user-level API to describe complex problems more easily. An example of solving the Laplace equation is provided to demonstrate the speed differences between pure Python, NumPy, and C++ implementations.
The document advocates for training computer scientists in a hybrid approach through a numerical software
Reliable from-source builds (Qshare 28 Nov 2023).pdfRalf Gommers
Short presentation covering some in-progress work around handling external (non-Python/PyPI) dependencies in Python package metadata and build steps better. Covers PEP 725 and what may come after.
The document discusses optimizing parallelism in NumPy-based programs. It provides examples of optimizing a main function from 50.1 ms to 2.83 ms using profiling and optimization. It discusses approaches for performant numerical code including vectorization and Python compilers. It also covers issues with oversubscription when using all CPU cores and parallel APIs in NumPy, SciPy, and scikit-learn. The document provides recommendations for tuning default parallel behavior and controlling parallelism in packages.
Pythran is a tool that can be used to accelerate SciPy kernels by transpiling pure Python and NumPy code into efficient C++. SciPy developers have started using Pythran for some computationally intensive kernels, finding it easier to write fast code with than alternatives like Cython or Numba. Initial integration into the SciPy build process has gone smoothly. Ongoing work includes porting more kernels to Pythran and exploring combining it with CuPy for fast CPU and GPU code generation.
Lightning talk given at the kickoff meeting for cycle 1 of the Essential Open Source Software for Science grant program from the Chan Zuckerberg Initiative (https://chanzuckerberg.com/eoss/).
Inside NumPy: preparing for the next decadeRalf Gommers
Talk given at SciPy'19. Abstract:
Over the past year, and for the first time since its creation, NumPy has been operating with dedicated funding. NumPy developers think it has invigorated the project and its community. But is that true, and how can we know? We will give an overview of the actions we've taken to improve the sustainability of the NumPy project and its community. We will draw some lessons from a first year of grant-funded activity, discuss key obstacles faced, attempt to quantify what we need to operate sustainably, and present a vision for the project and how we plan to realize it.
Authors: Ralf Gommers, Matti Picus, Tyler Reddy, Stefan van der Walt and Charles Harris
SciPy 1.0 and Beyond - a Story of Community and CodeRalf Gommers
My keynote at the SciPy 2018 conference, on SciPy 1.0 and community building.
Video: https://www.youtube.com/watch?v=oHmm3mPxg6Y&t=758s
Agentic Techniques in Retrieval-Augmented Generation with Azure AI SearchMaxim Salnikov
Discover how Agentic Retrieval in Azure AI Search takes Retrieval-Augmented Generation (RAG) to the next level by intelligently breaking down complex queries, leveraging full conversation history, and executing parallel searches through a new LLM-powered query planner. This session introduces a cutting-edge approach that delivers significantly more accurate, relevant, and grounded answers—unlocking new capabilities for building smarter, more responsive generative AI applications.
Traditional Retrieval-Augmented Generation (RAG) pipelines work well for simple queries—but when users ask complex, multi-part questions or refer to previous conversation history, they often fall short. That’s where Agentic Retrieval comes in: a game-changing advancement in Azure AI Search that brings LLM-powered reasoning directly into the retrieval layer.
This session unveils how agentic techniques elevate your RAG-based applications by introducing intelligent query planning, subquery decomposition, parallel execution, and result merging—all orchestrated by a new Knowledge Agent. You’ll learn how this approach significantly boosts relevance, groundedness, and answer quality, especially for sophisticated enterprise use cases.
Key takeaways:
- Understand the evolution from keyword and vector search to agentic query orchestration
- See how full conversation context improves retrieval accuracy
- Explore measurable improvements in answer relevance and completeness (up to 40% gains!)
- Get hands-on guidance on integrating Agentic Retrieval with Azure AI Foundry and SDKs
- Discover how to build scalable, AI-first applications powered by this new paradigm
Whether you're building intelligent copilots, enterprise Q&A bots, or AI-driven search solutions, this session will equip you with the tools and patterns to push beyond traditional RAG.
AI and Deep Learning with NVIDIA TechnologiesSandeepKS52
Artificial intelligence and deep learning are transforming various fields by enabling machines to learn from data and make decisions. Understanding how to prepare data effectively is crucial, as it lays the foundation for training models that can recognize patterns and improve over time. Once models are trained, the focus shifts to deployment, where these intelligent systems are integrated into real-world applications, allowing them to perform tasks and provide insights based on new information. This exploration of AI encompasses the entire process from initial concepts to practical implementation, highlighting the importance of each stage in creating effective and reliable AI solutions.
A brief introduction to OpenTelemetry, with a practical example of auto-instrumenting a Java web application with the Grafana stack (Loki, Grafana, Tempo, and Mimir).
How Insurance Policy Administration Streamlines Policy Lifecycle for Agile Op...Insurance Tech Services
A modern Policy Administration System streamlines workflows and integrates with core systems to boost speed, accuracy, and customer satisfaction across the policy lifecycle. Visit https://www.damcogroup.com/insurance/policy-administration-systems for more details!
14 Years of Developing nCine - An Open Source 2D Game FrameworkAngelo Theodorou
A 14-year journey developing nCine, an open-source 2D game framework.
This talk covers its origins, the challenges of staying motivated over the long term, and the hurdles of open-sourcing a personal project while working in the game industry.
Along the way, it’s packed with juicy technical pills to whet the appetite of the most curious developers.
Generative Artificial Intelligence and its ApplicationsSandeepKS52
The exploration of generative AI begins with an overview of its fundamental concepts, highlighting how these technologies create new content and ideas by learning from existing data. Following this, the focus shifts to the processes involved in training and fine-tuning models, which are essential for enhancing their performance and ensuring they meet specific needs. Finally, the importance of responsible AI practices is emphasized, addressing ethical considerations and the impact of AI on society, which are crucial for developing systems that are not only effective but also beneficial and fair.
Integrating Survey123 and R&H Data Using FMESafe Software
West Virginia Department of Transportation (WVDOT) actively engages in several field data collection initiatives using Collector and Survey 123. A critical component for effective asset management and enhanced analytical capabilities is the integration of Geographic Information System (GIS) data with Linear Referencing System (LRS) data. Currently, RouteID and Measures are not captured in Survey 123. However, we can bridge this gap through FME Flow automation. When a survey is submitted through Survey 123 for ArcGIS Portal (10.8.1), it triggers FME Flow automation. This process uses a customized workbench that interacts with a modified version of Esri's Geometry to Measure API. The result is a JSON response that includes RouteID and Measures, which are then applied to the feature service record.
Key AI Technologies Used by Indian Artificial Intelligence CompaniesMypcot Infotech
Indian tech firms are rapidly adopting advanced tools like machine learning, natural language processing, and computer vision to drive innovation. These key AI technologies enable smarter automation, data analysis, and decision-making. Leading developments are shaping the future of digital transformation among top artificial intelligence companies in India.
For more information please visit here https://www.mypcot.com/artificial-intelligence
Build enterprise-ready applications using skills you already have!PhilMeredith3
Process Tempo is a rapid application development (RAD) environment that empowers data teams to create enterprise-ready applications using skills they already have.
With Process Tempo, data teams can craft beautiful, pixel-perfect applications the business will love.
Process Tempo combines features found in business intelligence tools, graphic design tools and workflow solutions - all in a single platform.
Process Tempo works with all major databases such as Databricks, Snowflake, Postgres and MySQL. It also works with leading graph database technologies such as Neo4j, Puppy Graph and Memgraph.
It is the perfect platform to accelerate the delivery of data-driven solutions.
For more information, you can find us at www.processtempo.com
How to Generate Financial Statements in QuickBooks Like a Pro (1).pdfQuickBooks Training
Are you preparing your budget for the next year, applying for a business credit card or loan, or opening a company bank account? If so, you may find QuickBooks financial statements to be a very useful tool.
These statements offer a brief, well-structured overview of your company’s finances, facilitating goal-setting and money management.
Don’t worry if you’re not knowledgeable about QuickBooks financial statements. These statements are complete reports from QuickBooks that provide an overview of your company’s financial procedures.
They thoroughly view your financial situation by including important features: income, expenses, investments, and disadvantages. QuickBooks financial statements facilitate your financial management and assist you in making wise determinations, regardless of your experience as a business owner.
Alt-lenders are scaling fast, but manual loan reconciliation is cracking under pressure. See how automation solves revenue leakage and compliance chaos.
https://www.taxilla.com/loan-repayment-reconciliation
Micro-Metrics Every Performance Engineer Should Validate Before Sign-OffTier1 app
When it comes to performance testing, most engineers instinctively gravitate toward the big-picture indicators—response time, memory usage, throughput. But what about the smaller, more subtle indicators that quietly shape your application’s performance and stability? we explored the hidden layer of performance diagnostics that too often gets overlooked: micro-metrics. These small but mighty data points can reveal early signs of trouble long before they manifest as outages or degradation in production.
From garbage collection behavior and object creation rates to thread state transitions and blocked thread patterns, we unpacked the critical micro-metrics every performance engineer should assess before giving the green light to any release.
This session went beyond the basics, offering hands-on demonstrations and JVM-level diagnostics that help identify performance blind spots traditional tests tend to miss. We showed how early detection of these subtle anomalies can drastically reduce post-deployment issues and production firefighting.
Whether you're a performance testing veteran or new to JVM tuning, this session helped shift your validation strategies left—empowering you to detect and resolve risks earlier in the lifecycle.
Revolutionize Your Insurance Workflow with Claims Management SoftwareInsurance Tech Services
Claims management software enhances efficiency, accuracy, and satisfaction by automating processes, reducing errors, and speeding up transparent claims handling—building trust and cutting costs. Explore More - https://www.damcogroup.com/insurance/claims-management-software
The rise of e-commerce has redefined how retailers operate—and reconciliation...Prachi Desai
As payment flows grow more fragmented, the complexity of reconciliation and revenue recognition increases. The result? Mounting operational costs, silent revenue leakages, and avoidable financial risk.
Spot the inefficiencies. Automate what’s slowing you down.
https://www.taxilla.com/ecommerce-reconciliation
Insurance policy management software transforms complex, manual insurance operations into streamlined, efficient digital workflows, enhancing productivity, accuracy, customer service, and profitability for insurers. Visit https://www.damcogroup.com/insurance/policy-management-software for more details!
Top 5 Task Management Software to Boost Productivity in 2025Orangescrum
In this blog, you’ll find a curated list of five powerful task management tools to watch in 2025. Each one is designed to help teams stay organized, improve collaboration, and consistently hit deadlines. We’ve included real-world use cases, key features, and data-driven insights to help you choose what fits your team best.
Marketo & Dynamics can be Most Excellent to Each Other – The SequelBradBedford3
So you’ve built trust in your Marketo Engage-Dynamics integration—excellent. But now what?
This sequel picks up where our last adventure left off, offering a step-by-step guide to move from stable sync to strategic power moves. We’ll share real-world project examples that empower sales and marketing to work smarter and stay aligned.
If you’re ready to go beyond the basics and do truly most excellent stuff, this session is your guide.
3. Today’s Python data ecosystem
Can we make it easy to build on top of multiple array data structures?
4. State of compatibility today
All libraries have common concepts and functionality.
But, there are many small (and some large) incompatibilities. It’s very painful to
translate code from one array library to another.
Let’s look at some examples!
5. Consortium for Python Data API Standards
A new organization, with participation from maintainers of many array (or
tensor) and dataframe libraries.
Concrete goals for first year:
1. Define a standardization methodology and necessary tooling for it
2. Publish an RFC for an array API standard
3. Publish an RFC for a dataframe API standard
4. Finalize 2021.0x API standards after community review
See data-apis.org and github.com/data-apis for more on the Consortium
6. Goals for and scope of the array API
Syntax and semantics of functions
and objects in the API
Casting rules, broadcasting, indexing,
Python operator support
Data interchange & device support
Execution semantics (e.g. task
scheduling, parallelism, lazy eval)
Non-standard dtypes, masked arrays,
I/O, subclassing array object, C API
Error handling & behaviour for invalid
inputs to functions and methods
Goal 1: enable writing code & packages that support multiple array libraries
Goal 2: make it easy for end users to switch between array libraries
In Scope Out of Scope
7. Array- and array-consuming libraries
Using DLPack, will work for any two
libraries if they support device the
data resides on
x = xp.from_dlpack(x_other)
Data interchange between array libs
Portable code in array-consuming libs
def softmax(x):
# grab standard namespace from
# the passed-in array
xp = get_array_api(x)
x_exp = xp.exp(x)
partition = xp.sum(x_exp, axis=1,
keepdims=True)
return x_exp / partition
8. What does the full API surface look like?
● 1 array object with
○ 6 attributes: ndim, shape, size, dtype, device, T
○ dunder methods to support all Python operators
○ __array_api_version__, __array_namespace__, __dlpack__
● 11 dtype literals: bool, (u)int8/16/32/64, float32/64
● 1 device object
● 4 constants: inf, nan, pi, e
● ~115 functions:
○ Array creation & manipulation (18)
○ Element-wise math & logic (55)
○ Statistics (7)
○ Linear algebra (22)
○ Search, sort & set (7)
○ Utilities (4)
10. Mutability & copies/views
x = ones(4)
# y may be a view on data of x
y = x[:2]
# modifies x if y is a view
y += 1
Mutable operations and the concept of views are
important for strided in-memory array implementations
(NumPy, CuPy, PyTorch, MXNet)
They are problematic for libraries based on immutable data
structures or delayed evaluation (TensorFlow, JAX, Dask)
Decisions in API standard:
1. Support inplace operators
2. Support item and slice assignment
3. Do not support out= keyword
4. Warn users that mixing mutating operations and views
may result in implementation-specific behavior
11. Dtype casting rules
x = xp.arange(5) # will be integer
y = xp.ones(5, dtype=xp.float32)
# This may give float32, float64, or raise
dtype = (x * y).dtype
Casting rules are straightforward to align between
libraries when the dtypes are of the same kind
Mixed integer and floating-point casting is very
inconsistent between libraries, and hard to change:
Hence this will remain unspecified.
12. Data-dependent output shape/dtype
# Boolean indexing, and even slicing
# in some cases, results in shapes
# that depend on values in `x`
x2 = x[:, x > 3]
val = somefunc(x)
x3 = x[:val]
# Functions for which output shape
# depends on value
unique(x)
nonzero(x)
# NumPy does value-based casting
x = np.ones(3, dtype=np.float32)
x + 1 # float32 output
x + 100000 # float64 output
Data-dependent output shapes or dtypes are
problematic, because of:
● static memory allocation (TensorFlow, JAX)
● graph-based scheduling (Dask)
● JIT compilation (Numba, PyTorch, JAX,
Gluon)
Value-based dtype results can be avoided.
Value-based shapes can be important - the API
standard will include but clearly mark such
functionality.
13. Where are we now, and what’s next?
The array API standard is >90% complete and published for community review.
Still work-in-progress are:
● Data interchange with DLPack
● Device support
● Data-dependent shape handling
● A handful of regular functions (linalg, result_type, meshgrid)
Important next steps will be:
1. Complete the library-independent test suite
2. First (prototype) implementations in libraries
3. Get sign-off from maintainers of each array library
4. Define process to handle future & optional extensions
14. Thank you
Consortium:
● Website & introductory blog posts: data-apis.org
● Array API main repo: github.com/data-apis/array-api
● Latest version of the standard: data-apis.github.io/array-api/latest
● Members: github.com/data-apis/governance
Find me at: [email protected], rgommers, ralfgommers
Try this at home - installing the latest version of all seven array libraries in one
env to experiment:
conda create -n many-libs python=3.7
conda activate many-libs
conda install cudatoolkit=10.2
pip install numpy torch jax jaxlib tensorflow mxnet cupy-cuda102 dask toolz sparse