I Built a Job-Matching Algorithm. Now I Understand Why LinkedIn Struggles.

Why job recommendations are so bad, and why fixing them requires psychology, not just better embeddings.

I recently built a job-candidate matching system from scratch. Not a toy. A production system that takes real resumes and matches them against tens of thousands of real job postings. The kind of system that has to work well enough to affect real people’s careers.

Before I built it, I was like everyone else. I’d open LinkedIn, see recommendations for roles that had nothing to do with my background, and think: how hard can this be? I have a profile. You have job descriptions. Just… match them.

Now I get it. It’s genuinely, almost absurdly hard. And my frustration has turned into something closer to empathy.

Here are the five problems that surprised me most.

1. The Language Problem

The first thing you learn when you try to match resumes to jobs is that job ads are chaos.

I don’t mean they’re poorly written (though many are). I mean there is zero standardization in how companies describe the same role. Two postings for “Marketing Manager” might share almost no vocabulary. One reads like a corporate policy document. The other reads like a casual Slack message. One lists 15 requirements in bullet points. The other buries everything in a wall of text with no structure at all.

Now multiply that by tens of thousands of postings. Every company has its own tone, its own format, its own internal jargon. Some list required skills explicitly. Others describe responsibilities and expect you to infer the skills. Some include salary information, seniority level, and team size. Others give you a title and two sentences.

The challenge isn’t reading one job ad. It’s normalizing thousands of them into something comparable. You need to extract the same structured information from wildly different inputs, then represent it in a way that makes “Marketing Manager at a 50-person startup” and “Marketing Manager at a Fortune 500 company” both meaningful and distinct.

I threw NLP techniques at this. Named Entity Recognition to pull out skills, titles, and qualifications. LLM agents to parse unstructured text into structured templates. It works, mostly. But the sheer variety of how people write about the same job is humbling. Every time I thought I’d handled all the edge cases, a new batch of job ads proved me wrong.

This is the unsexy foundation of the whole problem. Before you can match anything, you have to understand what you’re matching. And understanding thousands of unstructured, inconsistent, sometimes contradictory job postings is a project in itself.

2. The Taxonomy Problem

Once you’ve parsed the job ads, you hit the next wall: what do these roles actually mean?

Is “Data Scientist” the same as “AI Engineer”? Sometimes. Is a personal driver qualified to drive a truck? Depends on the license and the context, but the underlying skills overlap more than you’d think. Is “Legal Consultant” the same as “Attorney”? In some jurisdictions, yes. In others, they’re completely different roles with different qualifications.

Job titles are almost meaningless on their own. They’re labels that companies assign based on internal conventions, not on any shared standard. A “VP of Engineering” at a five-person startup and a “VP of Engineering” at Google have almost nothing in common except the words on their business card.

This is where my background in industrial-organizational psychology turned out to be more useful than any ML technique I’d learned. I/O psychology has spent decades building frameworks for exactly this problem. Holland’s occupational codes classify jobs by the type of work and the type of person who thrives in that role. O*NET provides detailed taxonomies of skills, abilities, and work activities for thousands of occupations. Competency frameworks map out what people actually need to know and do, regardless of what their title says.

These frameworks gave me something that pure NLP couldn’t: a principled way to say “these two roles are related” or “these two roles sound similar but aren’t.” Without that psychological scaffolding, you’re just doing string matching with extra steps. The taxonomy problem isn’t a data problem. It’s a domain knowledge problem. And the domain is human work, which means the experts are psychologists, not engineers.

3. The Similarity Problem

Let’s say you’ve parsed all the job ads and classified everything into a proper taxonomy. Now you need to measure fit. How similar is this candidate to this role?

The standard approach is embedding-based similarity. You encode the candidate and the job into vectors, compute cosine similarity, and rank by score. It sounds elegant. In practice, it’s where I spent some of the most frustrating weeks of the project.

First, there’s the question of what to embed. The full resume? A structured summary? Individual skills? Each choice changes your results dramatically. Embed too much and you get noise. Embed too little and you lose context.

Then there’s the model. I tested multiple embedding models, including options from OpenAI, Google’s Gemini, and multilingual BERT variants. Their performance differed significantly, and the ranking of which model worked best wasn’t what I expected from reading benchmarks. Benchmarks are trained and tested mostly on English data. My data was in Hebrew.

That’s the other thing nobody warns you about. If you’re working in a non-English language, especially one with less NLP tooling, less training data, and different morphological structure, your mileage will vary wildly from what the leaderboards suggest. Hebrew is morphologically rich, written right-to-left, and has far fewer pretrained models tuned for it. Every technique I imported from English-language NLP research needed adaptation, sometimes minor, sometimes substantial.

And even once you’ve picked a model, you’re computing similarity across thousands of occupational categories. At that scale, you need to think carefully about efficiency. Brute-force pairwise comparison doesn’t scale. You end up building retrieval pipelines, RAG-based classification systems, and layered filtering approaches just to make the computation tractable. The math is simple. The engineering is not.

4. The Qualification Problem

This one surprised me the most, and it’s the one that convinced me this problem needs psychology, not just engineering.

Most matching systems treat fit as a spectrum: the more overlap between a candidate and a job, the higher the score. But that’s wrong. Fit isn’t linear. It’s curvilinear.

A VP of Sales applying for a junior sales development role isn’t a “95% match” because they have all the required skills and more. They’re a terrible match. They’ll be bored in a week, frustrated in a month, and gone in three. Overqualification is a real phenomenon, well-studied in I/O psychology, and it predicts turnover, dissatisfaction, and counterproductive work behavior just as reliably as underqualification does.

Similarly, someone with five years of backend engineering experience isn’t a good match for a CTO role just because the skill keywords overlap. The seniority gap matters. The scope of responsibility matters. The trajectory matters.

A good matching system has to model this. It can’t just ask “does this person have these skills?” It has to ask “is this role a reasonable next step for this person, given where they are in their career?” That’s a fundamentally psychological question. It requires understanding motivation, development trajectories, and person-environment fit at a level that keyword overlap will never capture.

I ended up building explicit overqualification and underqualification signals into the algorithm, drawing on decades of I/O psychology research about job satisfaction and person-job fit. It made the matches dramatically better. But it’s the kind of improvement you’d never discover if you approached this purely as an information retrieval problem.

5. The Classification at Scale Problem

The last major challenge is one that anyone working with LLMs at scale will recognize: classification drift.

LLMs are remarkably good at classifying unstructured text. Give a well-prompted model a job description and ask it to categorize the role, and it’ll do a solid job. The problem is consistency at scale.

When you’re classifying into thousands of categories, across tens of thousands of documents, small inconsistencies compound. The model might classify a “Data Analyst” role as “Business Intelligence” in one context and “Data Science” in another, depending on the surrounding text. It might drift toward popular categories and away from rare but valid ones. It might interpret the same Hebrew term differently based on subtle context shifts.

The deeper issue is that classification at this scale requires a feedback loop. You need to monitor drift, catch misclassifications, and continuously refine your prompts and reference sets. It’s less like training a model and more like managing a team of junior analysts who are mostly great but occasionally confident and wrong.

What I Learned

Building this system changed how I think about the job-matching problem. It’s not a search problem. It’s not even really an NLP problem. It’s a psychology problem dressed up as an engineering problem.

The technical tools matter. Embeddings, NER, LLMs, RAG, all of it is necessary. But none of it is sufficient. The hard part isn’t computing similarity. It’s defining what similarity means in the context of human work, human motivation, and human careers. That requires domain expertise that most engineering teams don’t have and most product roadmaps don’t account for.

I started this project mildly annoyed at LinkedIn for its bad recommendations. I’m ending it with genuine empathy. Every platform trying to match people to jobs is wrestling with these same problems, usually at far greater scale and across more languages than I was dealing with. The fact that their recommendations are mediocre isn’t a sign of laziness or incompetence. It’s a sign that this problem is legitimately one of the hardest applied AI challenges out there.

The good news is that it’s solvable. But solving it requires teams that combine engineering talent with deep domain knowledge in work psychology. It requires treating I/O psychology not as a nice-to-have but as a core competency. And it requires accepting that the best embedding model in the world won’t save you if you don’t understand what you’re trying to measure.

I’ll go deeper into the technical implementation in a future post. For now, I just wanted to share the conceptual landscape, because I think too many teams jump to the engineering before they understand the psychology. And that’s exactly backwards.

Tom Ron, PhD, is a Data Scientist specializing in AI-powered assessment tools. He holds a PhD in Industrial-Organizational Psychology. His work on AI ethics in HR has been published in Technology in Society. Learn more at tomron.ai

I Built a Job-Matching Algorithm. Now I Understand Why LinkedIn Struggles. was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.