The Data City
Data as a Service
We deliver comprehensive company data for the UK and Ireland, with expansion plans underway
to include France, Germany and North America. Our flexible data structure enables seamless
integration of a wide range of datasets.
Classification Expertise
• Real-Time Industrial Classifications (RTICs): Leveraging website crawling and machine
learning, we classify companies into relevant sectors not captured by traditional SIC codes or
standard classification systems.
• Real-Time SIC Codes (RSICs): We enhance accuracy in company classification by
reassigning SIC codes based on current website content, correcting outdated information and
user errors.
Data Science & Software Development
Our team brings deep expertise in advanced data analysis, processing, and software
development to power intelligent data solutions.
The UK Data
Open Data
Companies House
The Web
InnovateUK
Social Media
OECD Functional Urban
Areas demarcations
Partners’ data
CreditSafe
Dealroom
Lightcast
Proprietary metrics
Growth measures (company
year per year, turnover,
employees…)
Location quotients
Innovation scores
Estimated GVA
Estimated C02 emissions
Company size
We provide data on all 4.5M+ active companies registered with Companies House, plus inactive ones,
enabling analysis of company lifecycles and group structures. Partner data is also appended at the company
level. The data is made available through the Industry Engine, an online platform that allows users to
explore, visualize, analyse and download the data.
We’ve matched ~1.5M companies to their websites and extracted text content, enabling RTIC and RSIC
classification through machine learning and text analysis.
Application of TDC data: public sector
We collaborate with policymaking
institutions at all levels—from local to
national—transforming big data into
meaningful insights. Our work supports
evidence-based decision-making across a
broad range of users.
Innovation Clusters map: public-facing tool
developed for the Department of Science,
Industry and Technology that allows users
to understand the clustering of companies
across 40+ emergent technology sectors.
The map on the right shows overlapping
clusters of companies across the sectors
listed in the legend—all of which together
comprise the Life Sciences Ecosystem.
Application of TDC data: private sector
We collaborate with financial institutions to address data limitations that affect decision-making
and operational efficiency.
Why RSICs Matter: RSICs often provide a more accurate picture of a company’s real activity
than SIC codes—improving risk assessments, compliance, and client understanding.
The case of financial institutions:
Companies choose a SIC code at incorporation. It shapes critical decisions across the
financial system:
• Bank account eligibility
• Risk scoring & KYB checks
• AML monitoring & regulatory reporting
• GDP stats & economic modelling
Sometimes a company’s SIC does not reflect its true activity, causing significant problems.
NatWest fined £264.8M
• High-risk customer misclassified as low-risk
• SIC code didn’t match actual activity
• Enhanced due diligence not triggered
Santander fined £107M
• FX trading firm onboarded without proper risk
checks
• Classification failed to reflect true business
model
Data limitations
1. We need a company-website match to apply RTIC and RSIC classification
methodologies appropriately.
2. Inherited errors from data providers. Sources like Creditsafe, Dealroom, and
Lightcast, which may contain inaccuracies—such as outdated addresses or duplicate
records.
3. Although efforts are made to keep data current, some updates—like changes in
websites or the computation of metrics—can lag behind real-world events.
4. Despite using machine learning for classification, companies operating across multiple
sectors or with limited web presence may be misclassified or oversimplified.
The Data City
Harnessing AI as a Co-Creator for UK SMEs in Enhancing Productivity and Sustainability
Thank you
If you have any questions please
contact us
fatima.garcia@thedatacity.com

he Data City, United Kingdom - Fatima Garcia

  • 2.
    The Data City Dataas a Service We deliver comprehensive company data for the UK and Ireland, with expansion plans underway to include France, Germany and North America. Our flexible data structure enables seamless integration of a wide range of datasets. Classification Expertise • Real-Time Industrial Classifications (RTICs): Leveraging website crawling and machine learning, we classify companies into relevant sectors not captured by traditional SIC codes or standard classification systems. • Real-Time SIC Codes (RSICs): We enhance accuracy in company classification by reassigning SIC codes based on current website content, correcting outdated information and user errors. Data Science & Software Development Our team brings deep expertise in advanced data analysis, processing, and software development to power intelligent data solutions.
  • 3.
    The UK Data OpenData Companies House The Web InnovateUK Social Media OECD Functional Urban Areas demarcations Partners’ data CreditSafe Dealroom Lightcast Proprietary metrics Growth measures (company year per year, turnover, employees…) Location quotients Innovation scores Estimated GVA Estimated C02 emissions Company size We provide data on all 4.5M+ active companies registered with Companies House, plus inactive ones, enabling analysis of company lifecycles and group structures. Partner data is also appended at the company level. The data is made available through the Industry Engine, an online platform that allows users to explore, visualize, analyse and download the data. We’ve matched ~1.5M companies to their websites and extracted text content, enabling RTIC and RSIC classification through machine learning and text analysis.
  • 4.
    Application of TDCdata: public sector We collaborate with policymaking institutions at all levels—from local to national—transforming big data into meaningful insights. Our work supports evidence-based decision-making across a broad range of users. Innovation Clusters map: public-facing tool developed for the Department of Science, Industry and Technology that allows users to understand the clustering of companies across 40+ emergent technology sectors. The map on the right shows overlapping clusters of companies across the sectors listed in the legend—all of which together comprise the Life Sciences Ecosystem.
  • 5.
    Application of TDCdata: private sector We collaborate with financial institutions to address data limitations that affect decision-making and operational efficiency. Why RSICs Matter: RSICs often provide a more accurate picture of a company’s real activity than SIC codes—improving risk assessments, compliance, and client understanding. The case of financial institutions: Companies choose a SIC code at incorporation. It shapes critical decisions across the financial system: • Bank account eligibility • Risk scoring & KYB checks • AML monitoring & regulatory reporting • GDP stats & economic modelling Sometimes a company’s SIC does not reflect its true activity, causing significant problems. NatWest fined £264.8M • High-risk customer misclassified as low-risk • SIC code didn’t match actual activity • Enhanced due diligence not triggered Santander fined £107M • FX trading firm onboarded without proper risk checks • Classification failed to reflect true business model
  • 6.
    Data limitations 1. Weneed a company-website match to apply RTIC and RSIC classification methodologies appropriately. 2. Inherited errors from data providers. Sources like Creditsafe, Dealroom, and Lightcast, which may contain inaccuracies—such as outdated addresses or duplicate records. 3. Although efforts are made to keep data current, some updates—like changes in websites or the computation of metrics—can lag behind real-world events. 4. Despite using machine learning for classification, companies operating across multiple sectors or with limited web presence may be misclassified or oversimplified.
  • 7.
    The Data City HarnessingAI as a Co-Creator for UK SMEs in Enhancing Productivity and Sustainability Thank you If you have any questions please contact us [email protected]