How to Remove Duplicates from a Table in Power BI

Do you want to remove duplicates from your table visuals in Power BI?

Duplicate data can pose significant challenges in data analysis, as it can lead to erroneous results and distorted interpretations. The presence of duplicate records can skew statistical calculations, inflate data counts, and hinder the identification of true patterns and trends.

Consider the scenario of analyzing sales data containing duplicate customer orders. When each order is tallied multiple times, the resulting sales figures become inflated, compromising the accuracy of the sales performance representation. Likewise, the presence of duplicate entries in survey data can distort response distribution, concealing significant insights and preferences.

To address these concerns, Power BI offers user-friendly tools for removing duplicates. These tools are versatile as they enable you to identify and eliminate duplicate rows based on specific columns or a combination of columns.

In this post, you will learn how you can effectively cleanse your data of duplicates in Power BI, ensuring its accuracy and integrity.

Remove Duplicates in Power BI with Power Query

The first step in any data analysis task is data cleaning and transformation. Power Query is a foremost data transformation tool that possesses many functions that can help you reshape your data so it can be fit for analysis and return accurate results.

In this section, you will see how you can remove duplicate data using tools in Power Query.

When importing data in Power BI, simply choose the Transform Data option in the data import preview window to access the Power Query interface. From there, you can effortlessly reshape your data, optimizing its structure and making it more insightful.

When the data is loaded into Power Query, simply right-click on the column that contains the duplicate values and choose the Remove Duplicates option from the drop-down options.

The Remove Duplicates option is also available in the Remove Rows dropdown menu under the Home tab.

This will remove every duplicated item in that column and delete every record in that row from the table. Here, you can see that the rows in the table have reduced from 18 rows to 5 rows.

To remove duplicate records from the whole table, select all the columns in the query. You can do this by holding down the shift key while you select the columns, or you use the Choose Columns option in the Home tab.

When you remove duplicates from all columns in the table, Power Query will only remove a row or record from the table when that row has the same values as another.

This is unlike removing duplicates from one column where a record is removed from the table if that row has a duplicate value in the selected column.

Removing duplicates from a whole table would usually end up removing fewer rows than removing duplicates from one column because there are a lot more combinations of values to compare across a whole table than in one column.

Using the Advanced Editor

The Advanced Editor is another option you can use to remove duplicate rows. The Advanced Editor in Power Query holds a record of every modification made to the data in M language. From here, you can easily modify data by writing your own M language queries.

One major advantage of using the Advanced Editor over Power Query’s UI menu is that you can skip some steps.

For instance, removing duplicates using the Remove Duplicates menu is at least a three-step process. With the Advanced Editor, you can make this into a two-step process using the Table.Distinct function.

Table.Distinct(table as table, optional equationCriteria as any)

The M language table function removes duplicate rows from a table. The optional parameter, equationCriteria, specifies the columns from the table where duplicates should be removed. When omitted, all columns are tested for duplication.

#"Remove Duplicates" =Table.Distinct(Source{[Item="Table2",Kind="Table"]}[Data])

The Table.Distinct function takes a table as input and returns a new table with all duplicate rows removed. The Source function takes an item as input and returns its data. In this case, the item is a table named Table2. The Data property of the Table2 table contains the data to be deduplicated.

With the advanced editor, you can create a syntax like this to import the data and remove duplicate rows at once. Using the Advanced Editor can optimize the process and increase the efficiency of the data transformation phase of your analysis.

Remove Duplicates in Power BI with DAX

Data Analysis Expressions (DAX) is a powerful tool that allows users to perform complex calculations and data analysis on their data within Power BI and Power Pivot models. To use DAX to remove duplicates, you will have to create measures.

Measures in Power BI are formulas that perform calculations on data to derive insights. They are specifically designed for aggregating data for visualizations and reports. Measures are essential for summarizing and analyzing data in Power BI.

With measures, you can create aggregations based on the number of unique items in a column. This is because of two quite similar DAX functions – DISTINCT and VALUES.

The DISTINCT function has two other flavors – DISTINCTCOUNT and DISCTINCTCOUNTNOBLANK. They ultimately perform the same function albeit with very slight variations in how they operate in certain conditions.

Let’s take a look at each one.

DISTINCT

The DISTINCT function returns a unique list of all the different values in a table or column. This means it will only return each value once even if it appears multiple times in a table or column. This is how the other variations of the distinct functions also work.

DISTINCT ( <ColumnNameorTableExpr> )

The DISTINCT function is used with either a column or a table as an argument. When a single column is passed, it returns a column of unique values. If a table expression is used as the argument, it returns all the columns and eliminates any duplicate rows.

Essentially, the DISTINCT function in DAX can be considered equivalent to the Remove Rows feature in Power Query.

EVALUATE
DISTINCT ( ‘Product’[Color] )

Imagine you have a table with product details and the corresponding available colors. If you’re curious about all the unique product colors, you can utilize the DISTINCT function to obtain this information.

The DISTINCT function, when used in a measure, requires another function that either counts, filters or aggregates the distinct values it returns.

DISTINCTCOUNT and DISTINCTCOUNTNOBLANK

The DISTINCTCOUNT function returns the number of different values in a column, while the DISTINCTCOUNTNOBLANK does the same thing while ignoring blank values.

The DISTINCTCOUNTNOBLANK is a syntax sugar of the DISTINCTCOUNT function that excludes blank values. Both functions have the same parameters – columns, and they return a single value.

DISTINCTCOUNT ( table[column] )

DISTINCTCOUNTNOBLANK ( table[column] )

When there are no blank values, both functions return the same result.

The DISTINCTCOUNT and DISCTINCTCOUNTNOBLANK functions provide the count of unique values in a specific column. They offer valuable insights into the diversity of values captured within the column.

VALUES

The VALUES function is equivalent to the DISTINCT function. It takes a table/column parameter and returns a table or column. The distinction between these two functions lies in the fact that VALUES includes an extra blank row when the table has one or more one-to-many relationships that violate referential integrity with other tables.

VALUES ( <TableNameOrColumnName> )

EVALUATE
VALUES ( 'Product'[Color] )

By using DAX to remove duplicates, you can create meaningful insights based on the distinct values in each column, which otherwise would not be possible due to the presence of multiple values. This allows you to better understand the data and make more informed decisions.

Conclusions

Removing duplicates is an essential step in data cleaning and preparation for effective data analysis in Power BI. By eliminating redundant entries, you ensure the accuracy and integrity of your data, leading to more reliable and meaningful insights.

In Power BI, eliminating duplicates can be achieved through the Remove Duplicates or Advanced Editor options in Power Query. This effectively reduces the size and eliminates redundant data that is imported into the model.

However, in cases where columns possess multiple values, preventing duplicate removal, leveraging DAX can assist in deriving valuable insights without altering the data structure.

With Power BI, there’s no shortage of options when working with data with duplicate values.

About the Author

Oluwaseun Olatoye

Oluwaseun is a business intelligence analyst with expertise in Google Sheets, Excel, Power BI, SQL. He has worked with various businesses to make data-driven decisions. He enjoys helping others learn and grow.

Subscribe

Subscribe for awesome tech videos 😃

Comments

0 Comments

Submit a Comment Cancel reply

Get the Latest Tech Tips

Write For Us

Are you a tech enthusiast with a talent for writing great content? Come write for us!

Find Out More