Mastering LookML: A Practical Guide to High-Performance Data Modeling
Good LookML does not just answer questions; it makes them faster, cleaner, and easier to ask. If you have worked with Looker long enough, you know that LookML is more than just a modeling language; it is the foundation that turns raw data into reliable, business-ready insights. But building a scalable LookML model is not just about making things “work.” It is about writing clean, consistent, and maintainable code that your future self and your teammates will thank you for.
In this guide, we will go beyond the basics. We will break down Dimension and Measure syntax, Explore and Join definitions, and advanced techniques for Derived Tables, Persistent Derived Tables (PDTs), and caching strategies. Along the way, you will see not just what each parameter does, but exactly how to write it in LookML.
Whether you are building your first model or refactoring a complex analytics layer, these patterns and best practices will help you write LookML that is clear, reusable, and optimized for performance.
1. Dimension and Measure Syntax
1.1 Defining Dimensions
A dimension represents a data attribute (column) from your source. When defining one, follow these best practices:
- Naming convention: Use lowercase letters and underscores for spaces (e.g.,
full_name). - Code organization: Add new dimensions after existing ones in the view file for easier navigation.
Example:
dimension: first_name { sql: ${TABLE}.first_name ;; }
|
Key parameters for dimensions:
type- string – Default for text values.
- number – For numeric data and math computations.
- yesno – Boolean output (“Yes”/”No”).
- tier – Bucket values into ranges.
dimension_group– Creates multiple dimensions from one field.- type: time – For date/time fields (requires timeframes)
- Reference with
${created_date}. - type: duration – For time differences (requires intervals,
sql_start,sql_end) - Use instead of hard-coded
DATE_DIFF()for flexibility.
1.2 Defining Measures
A measure is an aggregated version of your data.
Common measure types:
sum– Adds values.average– Calculates the mean.count– Counts rows, uses primary key by default.count_distinct– Counts unique values.
Example:
measure: total_sales { type: sum sql: ${sale_price} ;; }
|
Additional options:
value_format_name– Display format (e.g.,usd_0for currency).filters– Apply conditions for a measure.
Example:
measure: new_user_count { type: count filters: [users.is_new_user: "Yes"] }
|
- Derived from other measures: In LookML, you are not limited to building measures directly from raw fields; you can also create new measures from existing measures, as long as the new measure’s type is set to number.
This is particularly useful when you want to calculate ratios, percentages, or other KPIs without repeating aggregation logic. By referencing other measures instead of re-writing SQL, you ensure your calculations stay consistent and automatically reflect any future changes to the original measures.
Example:
measure: profit_margin { type: number sql: ${total_profit} / ${total_revenue} ;; value_format_name: percent_2 }
|
In this example, total_profit and total_revenue are both pre-defined measures. By dividing them, you create a calculated KPI that is:
- Easier to maintain, no repeated SQL.
- Less error-prone, inherits logic from the original measures.
- Automatically updated if the source measures change.
💡 Pro Tip: Always reference measures with ${measure_name} instead of ${TABLE}.column_name. When doing this, it keeps your logic centralized and reduces the risk of mismatched calculations.
1.3 Referencing Fields
LookML uses substitution operators to reuse logic and reduce hard-coding.
${field_name}– References an existing dimension/measure in the same view.
Example:
sql: concat(${first_name}, " ", ${last_name}) ;;
|
${TABLE}.column_name– Direct database column reference, use when first defining a field.${view_name.field_name}– Reference fields from another joined view.
2. Explore and Join Syntax
2.1 Defining Explores
An Explore is a starting point for analysis, containing pre-joined views.
Example:
explore: orders { }
|
The Explore name must match an existing view name.
2.2 Defining Joins
Joins link additional views into an Explore.
Example:
join: users { type: left_outer sql_on: ${order_items.user_id} = ${users.id} ;; relationship: many_to_one }
|
Key join parameters:
type:left_outer(default),inner,full_outer,cross.sql_on: Join condition using LookML field references.relationship: Defines cardinality for symmetric aggregation (one_to_one,one_to_many,many_to_one,many_to_many).from: Used for join aliases when joining the same view multiple times.
2.3 Primary Keys and Filtering
primary_key: yes– Always define one in each view for symmetric aggregation.sql_always_where– Adds a hidden, unremovableWHEREclause.sql_always_having– Same as above, but inHAVINGclause.always_filter– Visible, required filter users cannot remove.conditionally_filter– Visible filter that can be removed if another filter is applied.
Summary Table: Filtering Types
| Feature | SQL Clause Applied In | Visible to User? | Editable by User? | Use Case |
|---|---|---|---|---|
sql_always_where |
WHERE |
❌ No | ❌ No | Apply a permanent row-level filter to restrict data (e.g., status = 'active'). |
sql_always_having |
HAVING |
❌ No | ❌ No | Apply a permanent aggregate-level filter (e.g., COUNT(*) > 0). |
always_filter |
Looker UI filter | ✅ Yes | ⚠️ Yes (can change value but cannot remove filter) |
Provide a default, required filter that guides analysis but still allows adjustments. |
conditionally_filter |
Conditional logic in LookML | ✅ Yes | ✅ Yes | Show a filter only if another specified filter is applied, useful for dependent filtering logic. |
3. Derived Table Syntax
Derived tables are custom tables defined in LookML that do not exist in the underlying database. They let you build reusable transformations directly in your model without altering the source schema.
💡 To learn more about best practices for Looker, check out our previous blog here.
3.1 SQL-Derived Tables (as Views)
A SQL-Derived Table (SDT) is defined with raw SQL inside LookML. You can create them via SQL Runner or directly in the IDE, and Looker will generate a .view file for you. You can treat this file like any other view in your project.
Why use SQL-Derived Tables?
- Ideal for transformations that are simpler to express in SQL than in LookML.
- Useful when you need a one-off pre-aggregation.
- Helpful for complex joins or CASE logic that do not translate well into Explore joins.
Example:
view: order_facts { derived_table: { sql: SELECT order_id, SUM(sale_price) AS total_sales FROM order_items GROUP BY order_id ;; } primary_key: order_id measure: total_sales { type: sum sql: ${TABLE}.total_sales ;; }}
|
3.2 Native Derived Tables (NDTs)
Native Derived Tables are written entirely in LookML and leverage existing LookML objects (dimensions, measures, filters).
Why use NDTs?
- Avoids repeating SQL logic; everything references LookML fields.
- Automatically inherits updates when the source fields change.
- Easier to maintain and portable across models.
Example:
view: top_customers { derived_table: { explore_source: orders { column: customer_id { field: orders.customer_id } column: total_spent { field: orders.total_spent } filters: [orders.order_status: "completed"] } } primary_key: customer_id}
|
Key Parameters:
explore_source– The Explore to build from (equivalent to SQLFROM).column– Output field in the derived table; can be renamed or aliased.filters– AddWHERE/HAVINGconditions.bind_filters– Pass user filters from Explore into the NDT.derived_column– Create a new calculated column inside the NDT.expression_custom_filter– Build complex filter logic withAND/OR.
3.3 Persistent Derived Tables (PDTs)
PDTs store derived tables in your database, so queries run against a materialized version instead of regenerating every time. This is critical for performance-heavy queries or when you need consistent snapshots of data.
Why use PDTs?
- Speeds up queries that would otherwise require large scans or expensive joins.
- Ensures consistency across users by serving the same materialized data.
- Useful for cascading logic where derived tables depend on each other.
Persistence Strategies:
Option 1 – Using datagroup_trigger: Recommended; rebuilds based on caching policy.
view: order_summary { derived_table: { sql: SELECT customer_id, COUNT(*) AS order_count FROM orders GROUP BY customer_id ;; } datagroup_trigger: orders_datagroup}
|
Option 2 – Using sql_trigger_value: Rebuilds when query result changes.
sql_trigger_value: SELECT MAX(updated_at) FROM orders ;;
|
Option 3 – Using persist_for: Keeps PDT fresh for a fixed time window.
persist_for: "8 hours"
|
3.4 Ephemeral Derived Tables (EDTs)
Ephemeral Derived Tables are temporary; they do not persist in the database. Instead, they exist only within the SQL query Looker generates, often as a Common Table Expression (CTE).
Why use EDTs?
- Great for lightweight, intermediate transformations.
- Avoids database storage costs.
- Perfect for logic that is only needed once and will not be reused.
Example:
view: temporary_calc { derived_table: { sql: SELECT user_id, COUNT(*) AS event_count FROM events GROUP BY user_id ;; ephemeral: yes }}
|
Summary Table: Derived Table Types
| Type | Defined In | Persists in Database? | Use Case |
|---|---|---|---|
| SQL-Derived Table | Raw SQL in .view | ❌ No | One-off transformations or complex SQL logic |
| Native Derived Table (NDT) | LookML | ❌ No | Reusable LookML-based aggregations |
| Persistent Derived Table (PDT) | SQL or LookML | ✅ Yes | Heavy queries or large aggregations needing caching |
| Ephemeral Derived Table (EDT) | SQL in .view with ephemeral: yes | ❌ No | Temporary calculations or CTE-style queries |
4. Caching and Datagroup Syntax
Datagroups are Looker’s way of controlling when cached query results or Persistent Derived Tables (PDTs) are refreshed. Think of them as centralized cache policies that multiple Explores and PDTs can share, ensuring your data stays fresh, performant, and consistent without overloading the database.
Why they matter:
- Performance: Reduce redundant queries and save database costs.
- Consistency: All dependent Explores and PDTs refresh on the same schedule.
- Control: Tie cache refresh to a database change (via
sql_trigger) rather than an arbitrary time.
4.1 Defining a Datagroup
Datagroups define cache refresh rules.
Best Practice:
Always define both max_cache_age (time-based refresh) and sql_trigger (event-based refresh).
max_cache_ageensures data will not be stale for too long.sql_triggerdetects actual source table changes and refreshes immediately when needed.
Example:
datagroup: orders_datagroup { max_cache_age: "1 hour" sql_trigger: SELECT MAX(order_id) FROM orders ;;}
|
max_cache_age: Forces a refresh every hour, even if no data change is detected.sql_trigger: Checks the highestorder_id; if it changes, the cache refreshes immediately.
4.2 Applying Datagroups
- At the Explore level: Control Explore-level caching.
explore: orders { persist_with: orders_datagroup}
|
- For PDT refresh: Tie PDT rebuilds to a datagroup.
datagroup_trigger: orders_datagroup
|
- Fixed-time caching (no datagroup): When datagroups are not supported.
persist_for: "2 hours"
|
Summary Table: Datagroup vs. persist_for
| Feature | Detects Data Changes? | Time-Based Refresh | Use Case |
|---|---|---|---|
| Datagroup | ✅ Yes (via sql_trigger) |
✅ Yes (max_cache_age) |
Dynamic freshness tied to DB changes, shared cache policy. |
persist_for |
❌ No | ✅ Yes | Fixed refresh when datagroups are not supported. |
Mastering LookML – Final Summary Table
| Component | Purpose | SQL Output | Best Practice |
|---|---|---|---|
| Explore | Combines views into an analysis starting point | FROM + JOIN |
Only join what is needed, use sql_always_where for security filters |
| Dimensions | Groupable fields, no aggregation | SELECT + GROUP BY |
Use business-friendly names, avoid raw DB column names |
| Measures | Aggregated metrics | SELECT with AGG() |
Reuse calculations across explores, avoid hardcoding in SQL |
| Dimension Filters | Filters applied before aggregation | WHERE |
Use for raw-level filtering to improve query speed |
| Measure Filters | Filters applied after aggregation | HAVING |
Only use when aggregation is necessary |
| SQL-Derived Tables (SDTs) | Custom SQL logic in LookML | Depends on SQL | Use for complex logic not possible in LookML fields |
| Native Derived Tables (NDTs) | Derived tables using LookML references | Auto-generated SQL | Easier to maintain, automatically adapts to schema changes |
| Persistent Derived Tables (PDTs) | Cached tables in DB for speed | Physical table in DB | Use for heavy queries, refresh with datagroups |
| Ephemeral Derived Tables (EDTs) | Temporary tables for query context | CTE in query | Great for lightweight logic without persistence |
| Datagroups | Control cache refresh rules | N/A | Always set both max_cache_age & sql_trigger |
| Persistence Controls | Keep PDTs fresh | N/A | Match refresh to data update frequency |
⭐⭐⭐
LookML is not just a modeling language; it is the bridge between your raw data and the business insights that drive action. By mastering concepts like Explores, dimensions, measures, derived tables, and datagroups, you unlock the ability to deliver fast, accurate, and consistent analytics at scale. Whether you are optimizing query performance, enforcing governance rules, or building reusable metrics, these best practices help ensure every dashboard and data exploration tells the right story, at the right time.
Ready to elevate your Looker experience? Our team can help you design a scalable LookML framework, optimize performance, and enable your business users to self-serve insights without compromising on governance.
Get in touch today and let’s turn your data into a competitive advantage.
Author: Umniyah Abbood
Date Published: Aug 21, 2025
