With all the rage on LLMs …some weekend thoughts on linguistics, computation, compression and storage … in an LLM conversation style … raises more questions, ideas, open explorations than provide answers : https://lnkd.in/ee79eEzJ Here are few quick shares : – Not all languages are equal in their capabilities in information (orthographic) density esp. in their scripts (written language). Chinese and English language are examples of diversity here.– Characters, character sets, encoding all have a role to play.– More importantly are there encoding optimization opportunities to store information using languages with higher orthographic density by using Hindi or Sanskrit instead of English […]
– every entitity has a lifecycle which can be repesented as a state machine. – one can represent this state machine as a decision tree. – any entity’s life cycle / decision tree / workflow / state machine will always have correlated / dependency events occuring in other entity’s lifecycle / decision tree / workflow / state machine – the last statement is essentially system behavior (the heart of any enterprise system) – this interplay of decision trees can always be represented as a sparse matrix As an architect if you decide what should be done at each combination / […]
Every organization generates tons of data these days. However most of it is never analyzed and therefore not acted up on. Some of the biggest barriers for data driven decisions for organizations are : – How do we discover data across all the data products automatically, – How do we catalog the data quickly and build a searchable metadata dataset at organization level– How do we govern this carefully so that we implement all relevant regulatory and security best practices ex: least-privilege Let’s take a detour, imagine you are in a airport, you check-in your baggage and at that time you […]
First a question: why does human brain theorize things in 3 or 4 terms most often ex: CAP, CMM , C-4, ACID, BASE etc. Most reserach papers and PPTs will have 3 letter or 4 letter acronyms rarely 5,6 (hexagons) or 8(octagons). (Interestingly 7s are missing and so are higher numbers 9+) Plausible explanation: because geometrically closed spaces start with 3 and then 4 sides. A topic can be interpreted and is of interest to human brain only if it’s a closed space (you want to sustainably keep it going). The reality is a circle or sphere with infinite sides […]
This is an under-utilized usage pattern and is a very good use of the tool as it seems to genuinely save time producing mostly accurate results. Especially for – Understanding, at a high level, current offerings in the market : Who are the players and what do they do.– General requirements/features/regulatory-needs.– General tools/solutions available– Exploration of the niche needs of a domain. This post was later published on LinkedIn here.
Why are B2B transactions not fully digital? even when B’s are digital and/or undergoing a ‘digital transformation’ ? (My PoV: it is hard but it need not be) – Digital B2C has lots of lessons to learrn from, yet B2B digital is much much more tougher to crack.– Remember how much time you spent on buying your best/next laptop. May be 4 hrs or more. Now imagine buying medical grade products as an organization.– Organization classification, product classification, product specs (spec standardization and/or taxonomy), search and discovery are key .– In B2C products are ‘kinda’ finished even when its a […]
What we expect from a database, usually, is an ability to store and retrieve data. Also often abilities to quickly analyze data and possibly mutate/process data in an optimal and somewhat non-trivial way. Given these expectations there are many database that are available today:– some designed for structured data like RDBMS– some designed for specific dimensions like time series databases, geo-spatial databases– some designed on a specific class of datastructures like Hash table, for example, key-value stores or graphs, for example, neo4j– some designed for unstructured data like blob storage services like GCS, S3, Azure blob There are a few […]
If you are interested in a prototype implementation of ‘Right-to-be-Forgetten’ as defined in this previous post, give a like or leave a comment here and I will try and create a github repo and youtube video soon (before Jan 15th hopefully). based on the demand. I Intend to keep the video to less than 5 min. and the github repo to minimum size feasible. This post was later published on LinkedIn here.
Can synthetic data help address bias in models? (my PoV : Yes) – real data is hard to acquire (and costly), anonymize (which itself can introduce distortions), is almost always biased (lets list a few studies that can prove/claim to have unbiased data for globe scale?)– sythentic data does not have above challenges.– one can control the parameters under which synthetic data gets generated to ensure it is near-realistic (a new word), can be scaled, can be reasoned (very important) to correct any issues including any unintended bias (yes it may still happen) Note :– IMHO, Using real world data […]
One of the easiest ways to motivate yourself to do Test Driven Development is to remember the times – When you played the role of UI developer and loved the instant feedback that browsers gave for your HTML, CSS, JS, TypeScript code– When you prepared for coding interviews using AlgoExpert or CodeSignal and loved the instant feedback from running test cases against your algorithm– When you played the role of a DataModeler, using tools like DBeaver or NocoDB, and loved the ability to run some queries against the data model and see if they are easy or tough to reason and write. Note :– Choose your architecture […]