Playing GROUP BY and HAVING Clauses with LINQ

LINQ (Language INtegrated Query) is one of the most powerful tools in .NET world. With this, code can be much simpler than before, especially for iterating objects. Using LINQ gives developers very similar experience when they send a SELECT query to a database server. This is a very basic example of SELECT query on SQL Server and LINQ statement.



SELECT *
FROM dbo.Cars
WHERE Manufacturer = ‘Hyundai’



// LINQ
var cars1 = (from c in context.Cars
where c.Manufacturer == “Hyundai”
select c);

// Fluent API
var cars2 = context.Cars
.Where(c => c.Manufacturer == “Hyundai”);

Then, how can we use aggregation using GROUP BY and HAVING clauses? We can easily achieve this goal in SQL query. However, it’s not that easy in LINQ at the first look. Basically the philosophy in both SQL query and LINQ are the same as each other, so once we get used to it, we can make use of this as easy as SQL query. In this article, I’d like to introduce several approaches of using GROUP BY and HAVING clauses in LINQ. You can find this sample codes on GitHub:

https://github.com/devkimchi/Composite-Key-Group-By-LINQ

Preparation

Before start, we need data ingredients for fermentation. In the sample repository stated above, there’s a local DB that contains sample data. Logon to your local DB by accessing to (localdb)\v11.0 with your Windows account and attach the database or simply run the following script below.



CREATE TABLE [dbo].Cars NOT NULL,
[Name] nvarchar NOT NULL,
[Manufacturer] nvarchar NOT NULL,
[Year] [int] NULL,
[Price] [int] NULL,
CONSTRAINT [PK_Car] PRIMARY KEY CLUSTERED (
[CarId] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

SET IDENTITY_INSERT [dbo].[Cars] ON

INSERT [dbo].[Cars] ([CarId], [Name], [Manufacturer], [Year], [Price]) VALUES (1, N’Lantra’, N’Hyundai’, 2011, 100)
INSERT [dbo].[Cars] ([CarId], [Name], [Manufacturer], [Year], [Price]) VALUES (2, N’Lantra’, N’Hyundai’, 2012, 100)
INSERT [dbo].[Cars] ([CarId], [Name], [Manufacturer], [Year], [Price]) VALUES (3, N’Lantra’, N’Hyundai’, 2013, 101)
INSERT [dbo].[Cars] ([CarId], [Name], [Manufacturer], [Year], [Price]) VALUES (4, N’Lantra’, N’Hyundai’, 2014, 110)
INSERT [dbo].[Cars] ([CarId], [Name], [Manufacturer], [Year], [Price]) VALUES (5, N’Golf’, N’VW’, 2011, NULL)
INSERT [dbo].[Cars] ([CarId], [Name], [Manufacturer], [Year], [Price]) VALUES (6, N’Golf’, N’VW’, 2012, 110)
INSERT [dbo].[Cars] ([CarId], [Name], [Manufacturer], [Year], [Price]) VALUES (7, N’Golf’, N’VW’, 2013, 115)
INSERT [dbo].[Cars] ([CarId], [Name], [Manufacturer], [Year], [Price]) VALUES (8, N’Golf’, N’VW’, 2014, 120)
INSERT [dbo].[Cars] ([CarId], [Name], [Manufacturer], [Year], [Price]) VALUES (9, N’118’, N’BMW’, 2013, 110)
INSERT [dbo].[Cars] ([CarId], [Name], [Manufacturer], [Year], [Price]) VALUES (10, N’118’, N’BMW’, 2013, 115)
INSERT [dbo].[Cars] ([CarId], [Name], [Manufacturer], [Year], [Price]) VALUES (11, N’118’, N’BMW’, 2014, 115)
INSERT [dbo].[Cars] ([CarId], [Name], [Manufacturer], [Year], [Price]) VALUES (12, N’118’, N’BMW’, 2014, 116)
INSERT [dbo].[Cars] ([CarId], [Name], [Manufacturer], [Year], [Price]) VALUES (13, N’Yaris’, N’Toyota’, 2012, 105)
INSERT [dbo].[Cars] ([CarId], [Name], [Manufacturer], [Year], [Price]) VALUES (14, N’Yaris’, N’Toyota’, 2013, 106)
INSERT [dbo].[Cars] ([CarId], [Name], [Manufacturer], [Year], [Price]) VALUES (15, N’Yaris’, N’Toyota’, 2013, 107)
INSERT [dbo].[Cars] ([CarId], [Name], [Manufacturer], [Year], [Price]) VALUES (16, N’Yaris’, N’Toyota’, 2014, NULL)
INSERT [dbo].[Cars] ([CarId], [Name], [Manufacturer], [Year], [Price]) VALUES (17, N’Yaris’, N’Toyota’, 2014, 108)
INSERT [dbo].[Cars] ([CarId], [Name], [Manufacturer], [Year], [Price]) VALUES (18, N’i30’, N’Hyundai’, 2012, 107)
INSERT [dbo].[Cars] ([CarId], [Name], [Manufacturer], [Year], [Price]) VALUES (19, N’i30’, N’Hyundai’, 2013, 107)
INSERT [dbo].[Cars] ([CarId], [Name], [Manufacturer], [Year], [Price]) VALUES (20, N’i30’, N’Hyundai’, 2014, 108)
INSERT [dbo].[Cars] ([CarId], [Name], [Manufacturer], [Year], [Price]) VALUES (21, N’i30’, N’Hyundai’, 2014, 109)
INSERT [dbo].[Cars] ([CarId], [Name], [Manufacturer], [Year], [Price]) VALUES (22, N’Lantra’, N’Hyundai’, NULL, 120)
INSERT [dbo].[Cars] ([CarId], [Name], [Manufacturer], [Year], [Price]) VALUES (23, N’i30’, N’Hyundai’, NULL, 125)

SET IDENTITY_INSERT [dbo].[Cars] OFF

Basic GROUP BY Clause

With data above, let’s populate a basic GROUP BY clause using an SQL query.



SELECT Manufacturer, MAX(Price) AS MaxPrice, MIN(Price) AS MinPrice
FROM [dbo].[Cars]
GROUP BY Manufacturer

Its result will be:































Manufacturer MaxPrice MinPrice
BMW 116 110
Hyundai 125 100
Toyota 108 105
VW 120 110

With LINQ, try to achieve the same result. There are two different types of using LINQ:



// LINQ
var results1 = (from c in this._context.Cars
group c by c.Manufacturer into g
select new CarViewModel()
{
Manufacturer = g.Key,
MaxPrice = g.Max(q => q.Price),
MinPrice = g.Min(q => q.Price)
});

// Fluent API
var results2 = this._repository
.Get()
.GroupBy(c => c.Manufacturer, r => new { Manufacturer = r.Manufacturer, Price = r.Price })
.Select(g => new CarViewModel()
{
Manufacturer = g.Key,
MaxPrice = g.Max(q => q.Price),
MinPrice = g.Min(q => q.Price)
});

When running both LINQ statements, the auto-generated SQL scripts will look like:



SELECT
1 AS [C1],
[GroupBy1].[K1] AS [Manufacturer],
[GroupBy1].[A1] AS [C2],
[GroupBy1].[A2] AS [C3]
FROM ( SELECT
[Extent1].[Manufacturer] AS [K1],
MAX([Extent1].[Price]) AS [A1],
MIN([Extent1].[Price]) AS [A2]
FROM [dbo].[Cars] AS [Extent1]
GROUP BY [Extent1].[Manufacturer]
) AS [GroupBy1]

And its result will be:




































C1 Manufacturer C2 C3
1 BMW 116 110
1 Hyundai 125 100
1 Toyota 108 105
1 VW 120 110

As you can see the result retrieved by LINQ is literally the same as the one by SQL query. Now let’s move on to the next query using GROUP BY clause with a composite key.

Composite GROUP BY Clause

We’re getting a maximum and minimum car price by car name and manufacturer. With SQL query, this can be:



SELECT Manufacturer, Name, MAX(Price) AS MaxPrice, MIN(Price) AS MinPrice
FROM [dbo].[Cars]
GROUP BY Manufacturer, Name

It results in:










































Manufacturer Name MaxPrice MinPrice
BMW 118 116 110
Hyundai i30 125 107
Hyundai Lantra 120 100
Toyota Yaris 108 105
VW Golf 120 110

How can we achieve this in c# with LINQ? Let’s see the following code example:



// LINQ
var results1 = (from c in this._context.Cars
group c by new { Manufacturer = c.Manufacturer, Name = c.Name } into g
select new CarViewModel()
{
Manufacturer = g.Key.Manufacturer,
Name = g.Key.Name,
MaxPrice = g.Max(q => q.Price),
MinPrice = g.Min(q => q.Price)
});

// Fluent API
var results2 = this._repository
.Get()
.GroupBy(c => new { Manufacturer = c.Manufacturer, Name = c.Name },
(g, r) => new CarViewModel()
{
Manufacturer = g.Manufacturer,
Name = g.Name,
MaxPrice = r.Max(q => q.Price),
MinPrice = r.Min(q => q.Price)
});

As you can see the above code, grouping by two keys – Manufacturer and Name in this example – can be done by creating an anonymous object containing those two keys. I personally prefer to the latter one using Fluent API approach as it does look clear in terms of grouping. Both codes auto-generate the same SQL query of:



SELECT
1 AS [C1],
[GroupBy1].[K2] AS [Manufacturer],
[GroupBy1].[K1] AS [Name],
[GroupBy1].[A1] AS [C2],
[GroupBy1].[A2] AS [C3]
FROM ( SELECT
[Extent1].[Name] AS [K1],
[Extent1].[Manufacturer] AS [K2],
MAX([Extent1].[Price]) AS [A1],
MIN([Extent1].[Price]) AS [A2]
FROM [dbo].[Cars] AS [Extent1]
GROUP BY [Extent1].[Name], [Extent1].[Manufacturer]
) AS [GroupBy1]

And it results in:
















































C1 Manufacturer Name C2 C3
1 BMW 118 116 110
1 Hyundai i30 125 107
1 Hyundai Lantra 120 100
1 Toyota Yaris 108 105
1 VW Golf 120 110

We can confirm that both SQL query and LINQ statements returns the same result set. Now let’s move on to the next query with WHERE clause to filter out NULL values.

Combination of WHERE and GROUP BY Clause

The original data has two records having NULL values on their Year field. We need to filter them out to get more accurate maximum/minimum price range. For this, the simplq SQL query looks like:



SELECT Manufacturer, Name, MAX(Price) AS MaxPrice, MIN(Price) AS MinPrice
FROM [dbo].[Cars]
WHERE [Year] IS NOT NULL
GROUP BY Manufacturer, Name

It returns the record set of:










































Manufacturer Name MaxPrice MinPrice
BMW 118 116 110
Hyundai i30 109 107
Hyundai Lantra 110 100
Toyota Yaris 108 105
VW Golf 120 110

Let’s see the c# code example to get the same results.



// LINQ
var results1 = (from c in this._context.Cars
where c.Year != null
group c by new { Manufacturer = c.Manufacturer, Name = c.Name } into g
select new CarViewModel()
{
Manufacturer = g.Key.Manufacturer,
Name = g.Key.Name,
MaxPrice = g.Max(q => q.Price),
MinPrice = g.Min(q => q.Price)
});

// Fluent API
var results2 = this._repository
.Get()
.Where(c => c.Year != null)
.GroupBy(c => new { Manufacturer = c.Manufacturer, Name = c.Name },
(g, r) => new CarViewModel()
{
Manufacturer = g.Manufacturer,
Name = g.Name,
MaxPrice = r.Max(q => q.Price),
MinPrice = r.Min(q => q.Price)
});

As you can see above, either where keyword or Where() method has been added. Both also generate the same SQL queries like:



SELECT
1 AS [C1],
[GroupBy1].[K2] AS [Manufacturer],
[GroupBy1].[K1] AS [Name],
[GroupBy1].[A1] AS [C2],
[GroupBy1].[A2] AS [C3]
FROM ( SELECT
[Extent1].[Name] AS [K1],
[Extent1].[Manufacturer] AS [K2],
MAX([Extent1].[Price]) AS [A1],
MIN([Extent1].[Price]) AS [A2]
FROM [dbo].[Cars] AS [Extent1]
WHERE [Extent1].[Year] IS NOT NULL
GROUP BY [Extent1].[Name], [Extent1].[Manufacturer]
) AS [GroupBy1]

It returns the result of:
















































C1 Manufacturer Name C2 C3
1 BMW 118 116 110
1 Hyundai i30 109 107
1 Hyundai Lantra 110 100
1 Toyota Yaris 108 105
1 VW Golf 120 110

Alright. Now, we want to grab cars having more than one records. For the next section, HAVING clause will be used.

GROUP BY with HAVING Clause

Let’s dig down the SQL query more precisely – Manufacturer, Name and Year. Typical SQL query looks like:



SELECT Manufacturer, Name, [Year], MAX(Price) AS MaxPrice, MIN(Price) AS MinPrice
FROM [dbo].[Cars]
WHERE [Year] IS NOT NULL
AND Price IS NOT NULL
GROUP BY Manufacturer, Name, [Year]

SELECT Manufacturer, Name, [Year], MAX(Price) AS MaxPrice, MIN(Price) AS MinPrice
FROM [dbo].[Cars]
WHERE [Year] IS NOT NULL
AND Price IS NOT NULL
GROUP BY Manufacturer, Name, [Year]
HAVING COUNT([Year]) > 1

Both queries return records grouped by Manufacturer, Name and Year. You can see single records on the first query, while the second query doesn’t return a single record as the second query contains HAVING clause.

Therefore the second query returns:









































Manufacturer Name Year MaxPrice MinPrice
BMW 118 2013 115 110
BMW 118 2014 116 115
Hyundai i30 2014 109 108
Toyota Yaris 2013 107 106

How to achieve this goal in c#? Let’s see the following code:



// LINQ
var results1 = (from c in this._context.Cars
where c.Year != null && c.Price != null
group c by new { Manufacturer = c.Manufacturer, Name = c.Name, Year = c.Year } into g
where g.Count() > 1
select new CarViewModel()
{
Manufacturer = g.Key.Manufacturer,
Name = g.Key.Name,
Year = g.Key.Year,
MaxPrice = g.Max(q => q.Price),
MinPrice = g.Min(q => q.Price)
});

// Fluent API
var results2 = this._repository
.Get()
.Where(c => c.Year != null && c.Price != null)
.GroupBy(c => new { Manufacturer = c.Manufacturer, Name = c.Name, Year = c.Year },
(g, r) => new
{
Manufacturer = g.Manufacturer,
Name = g.Name,
Year = g.Year,
Count = r.Count(),
MaxPrice = r.Max(q => q.Price),
MinPrice = r.Min(q => q.Price)
})
.Where(p => p.Count > 1)
.Select(p => new CarViewModel()
{
Manufacturer = p.Manufacturer,
Name = p.Name,
Year = p.Year,
MaxPrice = p.MaxPrice,
MinPrice = p.MinPrice
});

HAVING clause is basically the same as WHERE clause, but only used with GROUP BY clause. Therefore, as you can see above, another where keyword or Where() method has been used for HAVING. Another point we should get noticed is that anonymouse type objects are very actively used for grouping. As a result, both codes internally generate the same SQL query like:



SELECT
1 AS [C1],
[GroupBy1].[K2] AS [Manufacturer],
[GroupBy1].[K1] AS [Name],
[GroupBy1].[K3] AS [Year],
[GroupBy1].[A2] AS [C2],
[GroupBy1].[A3] AS [C3]
FROM ( SELECT
[Extent1].[Name] AS [K1],
[Extent1].[Manufacturer] AS [K2],
[Extent1].[Year] AS [K3],
COUNT(1) AS [A1],
MAX([Extent1].[Price]) AS [A2],
MIN([Extent1].[Price]) AS [A3]
FROM [dbo].[Cars] AS [Extent1]
WHERE ([Extent1].[Year] IS NOT NULL) AND ([Extent1].[Price] IS NOT NULL)
GROUP BY [Extent1].[Name], [Extent1].[Manufacturer], [Extent1].[Year]
) AS [GroupBy1]
WHERE [GroupBy1].[A1] > 1

And its results are:














































C1 Manufacturer Name Year C2 C3
1 BMW 118 2013 115 110
1 BMW 118 2014 116 115
1 Hyundai i30 2014 109 108
1 Toyota Yaris 2013 107 106

Conclusion

So far, we have discussed how to use GROUP BY clause and HAVING clause in c# code with helps of LINQ. It’s not as easy as we expected. However, the basic usages are understood, by comparing to the traditional SQL queries, it will be more than easier. If we carefully use Where() method for HAVING, GROUP BY aggregation will be achievable.