Top free Enterprise Information Management Tools
Getting an enterprise-wide Strategy for optimizing your Data and Information Assets isn’t something that can be rushed into. Most enterprise wide strategies touch on a wide range of Data requirements including semantics, governance, architecture, quality, search, storage, modelling and many other concerns. Feasibility is where many start by looking at obvious enablers and inhibitors such as personnel, budgets, risk appetite and so on. A large factor however is looking at the knowledge and technical requirements of commonly used software tools.
We are big fans of starting small, learning by doing and trialling tools and approaches before making big commitments. Given this we’ve put together a list of some of the top free Information Management tools. For those that are in a process of getting their Data Strategy out of the starting blocks these represent a very low cost method to start to increase maturity in key areas.
Open Source Data Quality and Profiling
For profiling, Data Quality & Data Migration Requirements
Identifying and managing data quality defects is critical for your Strategy to succeed. Growing data volumes, a range of data structures and technologies however make it increasingly necessary to learn the latest profiling techniques.
Open source DQ & Profiling represents a great way of learning key techniques such as outlier analysis without spending big bucks on the latest profiling software. It features outlier analysis, discovery, dashboards and support for ‘Big Data’ platforms such as Hadoop.
Data visualisation and Business Intelligence
Presenting visible and appealing insights through dashboards, reports and other analytics is key for most strategies. Whilst the BI market is becoming more commoditised and prices are falling however an investment in a fully fledged, general visualisation tool can still be substantial.
BIRT represents an excellent low cost way to familiarise yourself with reporting, analytics and data visualisation for free. It has many extensible, enterprise-grade features and a long established user base. Its ‘Big Data’ support has also been considerably extended in recent years.
Oracle Data Modeller SQL Developer
For Reverse & Forward Engineering Data Models
The growth of schema-less database technologies has made keeping track of your Data Assets through modelling more important than ever. With the latest software costing from hundreds to thousands of pounds to purchase its often difficult to justify purchases without the requisite experience and knowledge. So where do you start?
Oracle’s free tool represents a great way to start forward and reverse engineering a range of data models including logical, physical and dimensional. It features full source control and can be deployed in cloud environments making it great for collaboration.
For Data Integration, ETL & Master Data Management
Investing in an enterprise solution for Data transport, integration and managing key master data sets and rules can be challenging. Concerns around vertical integration, interoperability and being wedded to a single vendor can often get in the way of getting on with practical delivery.
Talend’s open source, modular integration stack enables users to get a feel for what living with an enterprise grade piece of software is like without the risks of a large software purchase. The company have an impressive list of blue-chip clients, a large user base and online support network. Recently the company have also heavily invested in ‘Big Data’ capabilities making it even more attractive.
For Data Science, Machine Learning & Data Mining
Sometimes general BI and visualisation tools aren’t sophisticated enough and more advanced statistical, Data Science and Machine Learning functionality is required. Often more advanced tools can be quite intimidating or very expensive, particularly if you are just starting to scratch the surface with decision trees, regression, classification and other techniques.
Orange Canvas started as an attempt by the Artificial Intelligence community to democratise and advance Machine Learning. That was back in 2007 and the platform has considerably advanced since then. It is a visually rich and flexible, python compatible, environment for getting a feel for this landscape and features a large user community and is extensively used by Academics. Highly recommended.
For Text Mining, extraction and Inference
Semantic tagging, search and Knowledge Management is a fast growing area with new tools, techniques and platforms emerging all the time. Given the exponential growth of unstructured content automated techniques such as Text Mining, Content Classification and Extraction are also becoming more and more essential.
For many organisations taking initial steps to benefit from the power of text mining and understand its business benefits to Data Governance and other initiatives can be challenging. Datumbox makes this a lot easier with its cloud facilities such as sentiment analysis, topic classification and concept extraction. There are huge cost and time savings to be gained in most organisations by making information easier to find and this is a great tool for illustrating how to start.
For Glossary / Vocabulary Management & Ontology Development
Semantic technology is one of the fastest growing and most exciting IT sectors at present, for good reason. As systems and processes become increasingly interconnected the importance of commonly understood terms and definitions has never been greater.
Protégé, by Stanford university, is ideal for starting to take control of your semantic landscape by using the latest ontology modelling techniques. It has a large user base and support community as well as a wealth of online training materials and courses.