mrbusche.com

Using Poetry with platform argument

by Matt Busche on March 6, 2025

Poetry has recently become our enterprise standard, but it still lacking some features that pip provides. Thankfully, a pull request has been opened to add the platform flag. Unfortunately, it has not yet been merged.

To get things working, you can install poetry 2.0.1 and the latest commit from the pull request repo. If you’re feeling risky, you could point to the branch, but we were more comfortable pointing to a specific commit

python3 -m pip install poetry==2.0.1
python3 -m pip install git+BrandonLWhite/poetry-plugin-bundle-bw.git@640529e823cd2cb678831409e646c1f641279953
python3 -m poetry bundle --platform manylinux2014_x86_64

Book Notes - LLMs in Production

by Matt Busche on December 20, 2024

LLMs in production

A word of warning: Embrace the future now All new technology meets resistance and has critics; despite this, technologies keep being adopted, and progress continues. In business, technology can give a company an unprecedented advantage. There’s no shortage of stories of companies failing because they didn’t adapt to new technologies. We can learn a lot from their failures

Like all other skills, your proximity and willingness to get involved are the two main blockers to knowledge, not a degree or ability to notate—these only shorten your journey toward being heard and understood. If you don’t have any experience in this area, it might be good to start by first developing an intuition around what an LLM is and needs by contributing to a project like OpenAssistant. If you’re a human, that’s exactly what LLMs need. By volunteering, you can start understanding what these models train on and why. If you fall anywhere, from no knowledge up to being a professional machine learning engineer, we’ll be imparting the knowledge necessary to shorten your time to understanding considerably.

Our current understanding of language is that language is made up of at least five parts: phonetics, syntax, semantics, pragmatics, and morphology. Each of these portions contributes significantly to the overall experience and meaning being ingested by the listener in any conversation.

Phonetics is probably the easiest for a language model to ingest, as it involves the actual sound of the language. This is where accent manifests and deals with the production and perception of speech sounds, with phonology focusing on the way sounds are organized within a particular language system.
Syntax is the place where current LLMs are highest-performing, both in parsing syntax from the user and generating its own. Syntax is generally what we think of as grammar and word order; it is the study of how words can combine to form phrases, clauses, and sentences. Syntax is also the first place language-learning programs start to help people acquire new languages, especially based on where they are coming from natively
Semantics are the literal encoded meaning of words in utterances, which changes at breakneck speed in waves. People automatically optimize semantic meaning by only using words they consider meaningful in the current language epoch.

In the seminal paper, “Attention Is All You Need,” [1] Vaswani et al. take the mathematical shortcut several steps further, positing that for performance, absolutely no recurrence (the “R” in RNN) or any convolutions are needed at all.

Reasoning and Acting (ReAct) is a few-shot framework for prompting that is meant to emulate how people reason and make decisions when learning new tasks. [10] It involves a multistep process for the LLM, where a question is asked, the model determines an action, and then observes and reasons upon the results of that action to determine subsequent actions.

PUSHING THE BOUNDARIES OF COMPRESSION After going down to int4, there are experimental quantization strategies for going even further down to int2. Int2 70B models still perform decently, much to many peoples’ surprise.

We can enhance this approach in an important way that we haven’t touched on yet: using knowledge graphs. Knowledge graphs store information in a structure that captures relationships between entities. This structure consists of nodes that represent objects and edges that represent relationships. A graph database like NEO4J makes it easy to create and query knowledge graphs. And as it turns out, knowledge graphs are amazing at answering more complex multipart questions where you need to connect the dots between linked pieces of information. Because they’ve already connected the dots for us

Links - Morgan Stanley uses AI evals to shape the future of financial services

by Matt Busche on December 6, 2024

Sharing this post on Morgan Stanley and AI from the OpenAI blog.

“This technology makes you as smart as the smartest person in the organization. Each client is different, and AI helps us cater to each client’s unique needs.” Jeff McMillan, Head of Firmwide AI at Morgan Stanley

Evals

To evaluate GPT-4’s performance against their experts, Morgan Stanley ran summarization evals to test how effectively the model condensed vast amounts of intellectual capital and process-driven content into concise summaries. Advisors and prompt engineers graded AI responses for accuracy and coherence, allowing the team to refine prompts and improve output quality.

The biggest takeaway

The eval framework wasn’t static; it evolved as the team learned.

This should be expected with Generative AI, your eval framework should not be static, and your project is never done.

Expanding the corpus

This is massive and extremely well done

“We went from being able to answer 7,000 questions to a place where we can now effectively answer any question from a corpus of 100,000 documents,” says David Wu, Head of Firmwide AI Product & Architecture Strategy at Morgan Stanley.

Adoption

over 98% adoption in wealth management

!! !! !!

Their strong eval framework has also unlocked a flywheel for future solutions and services.

I am currently reading The Value Flywheel Effect so this really resonated with me.

They’ve tackled the two hardest obstacles, in my opinion, they’ve created a very successful project AND gotten widespread adoption. When stakeholders are bought in like this, releasing additional projects should have a much lower barrier to entry.

Links - Lucky Maverick | The Secret to Success - Mimic Evolution

by Matt Busche on December 6, 2024

Sharing this post from Jonathan Bales on Evolution.

most decisions really don’t matter.

What to Learn from Evolution

You can be dumb and still fund hug success, as long as you

Perfect the trial-and-error cycle

The faster you can make the trial-and-error cycle work, the quicker you can find success

The best way to speed up your learning curve: be extreme.

Don’t suppress chaos

Less Data

You don’t always need more data. Usually, you need less.

The more data we have, the more likely we are to drown in it. - Taleb

This quote is so true, we wait to make a decision when we have way too much information. We could have made an educated decision months ago, but we continually delay to reduce our risk of failure. If you don’t take chances, you’ll never fail, but you’ll also never learn to overcome adversity.

Removing Fragility

Avoid risk of ruin

You should take on all kinds of risk when there’s nothing to lose.

Admit how little you know

Overconfidence is fragility.

Amazon Nova foundation models

by Matt Busche on December 6, 2024

Amazon announced their Amazon Nova foundation models on December 3, 2024. The main features are low cost and low latency.

Amazon Nova

Micro

very low cost
text-only
great for summarization, translation, and classification
128k token context

Lite

Multimodal
- Multiple images
- Up to 30 minutes of video
300k token context
Find-tuning with model distillation is available

Pro

Similar to Lite but with more accuracy
Serves as a teacher model to distill custom varians of Lite and Micro

Premier

Availability in early 2025

Simon Willison compares them to the Google Gemini family of models.

Costs for micro are $0.035 per million input and $0.14 for output, $0.06 and $0.24 for lite, and $0.80 and $3.20 for Pro. The micro model is slightly cheaper than Gemini 1.5 Flash-8B and appears to be the cheapest model available.

Links - No Priors Ep. 91 | With Cohere Co-Founder and CEO Aidan Gomez

by Matt Busche on December 6, 2024

This interview with Cohere CEO Aidan Gomez is a must watch (or listen) for anyone interested in AI. Here are some key takeaways:

The role of “luck and chance” in his journey.
The smallest nuance can significantly impact language model outputs, highlighting the need for robust and reliable models (Cohere is trying to build models with this in mind).
Tailoring AI solutions to specific needs, like generating medically relevant summaries for a doctor based on a simple phrase like “My knee hurts.” vs requiring them to look through 20 years of patient notes.
Listening to customers is crucial for identifying valuable applications. We are not even close to having all the answers with LLMs.
Begin with the most cost-effective solutions and gradually build complexity. You shouldn’t be starting from scratch is most use cases.
Waiting 6-12 months can dramatically reduce development costs (specifically referring to building an LLM). Staying on the bleeding edge may be profitable but waiting 6-12 months and taking advantage of everything learned is more profitable.

Links - Seth Godin Severe weather alert

by Matt Busche on December 2, 2024

Like a lot of Seth Godin posts, this one is short but impactful, Severe Weather Alert

He discusses getting an alert every day about severe weather in his area, but now it’s just “weather”

We think that regularly alerting people to something is likely to get their attention again and again.

The more an application cries wolf, the more likely we are to ignore it

Amazon Bedrock Updates for November 2024

by Matt Busche on December 1, 2024

Reducing hallucinations in large language models with custom intervention using Amazon Bedrock Agents

Amazon Bedrock Guardrails offers hallucination detection and grounding checks (existing functionality)
You can develop a custom hallucination score using RAGAS to reject generated responses (requires SNS)

Amazon Bedrock Flows is now generally available

Prompt Flows are named to Amazon Bedrock Flows (Microsoft also uses the name Prompt Flows)
You can now filter harmful content using Prompt node and Knowledge base node
Improved traceability - you can now quickly debug workflows with traceability of input and outputs

Prompt Optimization on Amazon Bedrock

This is exactly what it sounds like, you provide a prompt and your prompt is optimized for use with a specific model, which can result in significant improvements for Gen AI tasks

Sharing - RIP to RPA - The Rise of Intelligent Automation

by Matt Busche on December 1, 2024

Notes from this article by Andreesen Horowitz RIP to RPA: The Rise of Intelligent Automation

Traditionally, robotics process automation (RPA), was a hard coded “bot” that mimicked exact key strokes necessary to complete a task. With LLMs, however, the original vision of RPA is now possible. An AI agent can be prompted with an end goal, e.g. book a flight from DSM to ORD on these dates, and will have the correct agents available to complete the task.

There is a large opportunity for startups in this space, because no existing product meets the original vision of RPA. There are two main areas:

horizontal AI enablers that execute a specific function for a broad range of industries, and vertical automation solutions that build end-to-end workflows tailored to specific industries.

Future of Business - Palo Alto Networks’ Nikesh Arora on Managing Risk in the Age of AI

by Matt Busche on November 30, 2024

I really enjoyed this podcast with Nikesh Arora, CEO of Palo Alto Networks, where he discussed how much of their strategy is tied to acquisition vs trying to build everything themselves in house. He had some other insights, that probably aren’t revolutionary, but I appreciated his openness in this interview.

Here are my key takeaways:

The AI Revolution and Cybersecurity

With practically everything being internet connected, the potential points of vulnerability for cyberattacks are enormous
Bad actors are increasingly using AI to infiltrate systems, which requires companies Palo Alto Networks use AI to counter their attacks
He sees AI as a productivity tool that will augment human work, taking over repetitive tasks and allowing employees to focus on more enjoyable tasks

Acquisition and Integration

Palo Alto is acquiring innovative cybersecurity companies to stay ahead of threats
He stresses the importance of empowering the acquired teams and providing them with resources

Concerns and Risks

He discusses the importance of a zero trust security model, treating every user and device with the same level of scrutiny
They talk about the (obvious) potential for GenAI to be used maliciously
Arora anticipates regulations focusing on transparency, guardrails, and control of critical processes
He strongly emphasizes the importance of collaboration between industry and regulators.