Monthly Archives: January 2022

Enterprise Knowledge Graphs and the need for it

Today, most enterprises have a Business Intelligence and analytics teams. They address the time-sensitive, operational needs of the organization. Also, importantly, business decisions are taken based on insights discovered from these platforms. Often AI/ML helps project into the future. Most often, BI platforms and the need for a workforce is acknowledged and highly valued by CXO team. Data from various departments including sales and R & D flows into BI platforms via data warehouses/data lakes / lake houses. However, direct access to operational DBMS systems is still needed at times. Also, data may need to flow in reverse ETL from data warehouses to DBMS systems.

The above scheme is mostly accepted as necessary. In reality, adoption, data lineage, speed of insight generation and subsequent discovery varies. Often human insight is still ahead of the system. Here’s an opportunity for improvement.

One of the key additions to the above ecosystem could be Enterprise Knowledge Graphs. They can address a critical-need for ‘drilling-down’ into the data to arrive at the ‘nugget of gold’. This while feasible in current scheme, it is dependent on human skill. A skilled CXO might be able to get to the ‘insight’ with the right ‘SQL’ query (they may or may not write it though). This is not uncommon.

EKGs have the potential to bring together key ‘identities’ and their ‘relationships’ across organization. People, departments, products, customers, geography, time, research, language and inter-dependencies. The ‘operational’ facts can/should continue to come from data warehouse/data-lake/lake-house.

Can an organization achieve benefits of an EKG by leveraging investments in a ‘Master data management’ system? Yes, partially. In practice, ‘MDM’ is not brought into BI platforms, its siloed and has less visibility. ‘Mastering’ data is considered a data engineering act. Instead, an EKG system would address the organizational needs more holistically when it’s integrated into BI platforms through GraphQL.

Understanding needed for building an EKG is natural for any organization’s team. They know this intuitively. Skills and standards may be evolving. Web3 and subsequent conversations around semantic web are helping bridge the gaps. Most of these conversations are about blockchains. A necessary area that needs a focused effort, of its own, in the very near future. EKGs, though, can be built now and can provide value right away.

Let us know if EKGs, Semantic Web interest you. Here’s an open knowledge graph that can help you draw an analogy to your organizational needs. Write to us. We are happy to help.

Open knowledge graph on clinical trials

VaidhyaMegha is building an open knowledge graph on clinical trials, comprising

  • Clinical trial ids from across the globe
  • MeSH ids for
    • Symptoms/Phenotype
    • Diseases
  • PubMed Article ids
  • Genotype(from Human Genome),


Below is a very brief specification

  • Inputs
    • Mesh RDF
    • WHO’s clinical trials database – ICTRP.
    • US clinical trial registrydata from CTTI’s AACT database.
    • Data from clinical trial registries across the globe scraped from their websites’ ex: India
    • MEDLINE Co-Occurrences (MRCOC) Files
  • Outputs
    • Clinical Trials RDF with below constituent ids and their relationships
      • MeSH, Clinical Trial, PubMed Article, Symptom/Phenotype, Genotype(from Human Genome)
      • Additionally, clinical trial -> clinical trial links across trial registries will also be discovered and added.

Source code

  • Source code would be hosted here.

Release notes

v0.2 : 27-Jan-2022

  • Clinical trials are linked to the RDF nodes corresponding to the MeSH terms for conditions.
  • Download the enhanced RDF from here.


VaidhyaMegha’s prior work on

  • clinical trial registries data linking.
  • symptoms to diseases linking.
  • phenotype to genotype linking.
  • trials to research articles linking.

Last 3 are covered in the examples folder. They were covered in prior work in separate public repos.

FAQ : Cloud : NFRs, FOSS and Cost : TL;DR version

  1. Can you outsource the responsibility of acheiving/managing your NFRs to Cloud providers – No
  2. Can you outsource the responsibility of decision making necessary to choose and balance your NFRs to cloud providers – Absolutely No
  3. Can you execute your decisions on NFRs faster with cloud providers – Absolutely Yes
  4. Can you outsource the responsbility of monitoring your NFRs to cloud providers – No
  5. Do cloud providers bring additional overheads to deal with, like any other previous iterations ex: Data centers – Absolutely Yes – ex: cost management
  6. Should you avoid cloud altogether – Absolutely No
  7. Should you adopt clouds wholesome – Yes
  8. Can cloud providers fail wholesome – Yes
  9. Should you implement multi-cloud strategy right now – No
  10. Are all clouds same – No
  11. Are their standards around cloud – Yes
  12. Can cloud providers really help you build the most efficient version of your product for your chosen NFRs – Absolutely No
  13. Are there seemingly contradictory statements above – Yes

Note : This is why you need architects and why every cloud provides blueprints, certifications, blogs, sample applications and are constantly trying to refine, refactor, rebuild their services.

Source : Decades of experiences across many cloud providers.

Trigger : Recent conversations with enterpreneurs/clients/prospects.

Reference : What is an NFR? What is TL;DR

Update :

Work in progress, longer version : Feel free to copy. Your comments within the doc are going to be very helpful for the final version.


A simple time-series model using spreadsheet alone, aimed to simulate changes in global population including peak and plateau. AI/ML models need training, validation and testing before use. Simple statistical/problem modeling with necessary assumptions/parameters can help you simulate a real world problem faster. This exercise is almost always necessary and best done upfront.

Ping us if you have a data/analytics/AI-amenable problem, we would love to help you realize your business objectives faster and iteratively get you to the right value based RoI state.

Note : Please feel free to copy and playaround or provide feedback as comments

Author : Sandeep Kunkunuru

PS : This is a a no-code/low-code problem modeling and simulation exercise. What is an AI-amenable problem ? and why statistical/problem modeling is needed upfront ? here’s a relevant article to read further :