Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing type based Token parsing and processing #2758

Draft
wants to merge 6 commits into
base: feat/sqlclientx
Choose a base branch
from

Conversation

cheenamalhotra
Copy link
Member

@cheenamalhotra cheenamalhotra commented Aug 7, 2024

This PR adds the type based TDS token parsing workflow.

Note: Perf improvements are subject to benchmark testing, but there are design options available to us that we will revisit in the async callbacks to improve state machine handling.

Keeping that work out of scope from this PR.

How will token parsing work?
The new Parser uses TokenStreamHandler to read token type and all related data for each token, which will result in customized 'Token' object to be returned to the parser. This customized token object will then be handled back in the parser's state information using 'TokenProcessors'. This way we are separating token parsing from processing into TDS Context, keeping SRP intact.

Further perf improvements can be targeted in token Parsers, keeping processors untouched.

@cheenamalhotra cheenamalhotra force-pushed the cheena/tdsparser-tokens branch 2 times, most recently from 38c5e9c to 39bc167 Compare August 7, 2024 01:32
@cheenamalhotra cheenamalhotra force-pushed the cheena/tdsparser-tokens branch 2 times, most recently from a82490d to f0390e9 Compare August 7, 2024 01:42
@cheenamalhotra cheenamalhotra force-pushed the cheena/tdsparser-tokens branch 3 times, most recently from 0a9bce0 to ef5c9eb Compare August 7, 2024 15:14
Copy link
Contributor

@mdaigle mdaigle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just curious to see how all of the different state objects come together.

{
internal class TdsCommandContext
{
// TDS stream processing variables
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably unnecessary comment?

/// <summary>
/// PLP data length indicator
/// </summary>
public ulong PlpLength;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be auto-properties rather public member variables.

/// <summary>
/// Buffer to handle low-level byte operations.
/// </summary>
internal class ByteBuffer : IEnumerable<byte>, IEquatable<ByteBuffer>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I'd like to see this as and the related changes as a separate PR. There's a lot of useful stuff in token types and stream parsing, but I think we need to rethink the changes for IO in this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed a way to read Bytes off the stream in a buffer, therefore this came in here. This is temporary, and should not be needed later once we can support direct TdsStream buffer reading.


namespace Microsoft.Data.SqlClientX.Tds.State
{
internal class TdsCommandContext
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All things considered, I'm not really sure what the intention of this class is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This came because I was trying to separate context info for command/connection.. This is not used by parser as of now, because I'm only writing it for login flow, but in future this can be extended and used by parser. Will remove it as it has no purpose as of now.

/// <summary>
/// Captures context information required to run TDS parser operations.
/// </summary>
internal class TdsContext
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't think we need to be passing around a context object for TDS parsing operations. This feels to me like it's the same thing as the state object of old, just with the new, acceptable name "context".

The only situation where I see this being valuable is if the connection handler chain needs something to dump its results to and this is used to initialize a parser. But, since there isn't a parser in this PR, I don't see how all this fits together.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's going to be used to capture login response data from server that parser is reading off stream, and send it back to SqlConnector who will own the TdsContext.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is your plan still to have the tds parser be a static/stateless class? If not, then the the data from the server can be stored in the parser, no need for context objects. If it is ... then I think we really need to rediscuss why we're taking this approach.


namespace Microsoft.Data.SqlClientX.Tds.Tokens.Info
{
internal class InfoTokenParser : TokenParser
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments :)


namespace Microsoft.Data.SqlClientX.Tds.Tokens.LoginAck
{
internal class LoginAckTokenParser : TokenParser
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments


uint version = await tdsStream.TdsReader.ReadUInt32BEAsync(isAsync, ct).ConfigureAwait(false);
TdsVersion tdsVersion = LoginAckTokenParser.GetTdsVersion(version);
tdsStream.TdsVersion = tdsVersion;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not directly assign?


namespace Microsoft.Data.SqlClientX.Tds.Tokens.ReturnStatus
{
internal class ReturnStatusTokenParser : TokenParser
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments


namespace Microsoft.Data.SqlClientX.Tds.Tokens
{
internal sealed class TokenStreamHandler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments

@saurabh500
Copy link
Contributor

For env change types, do we need a new type for every env change with old and new value? Since the EnvChangeToken<T> could itself be enough to solve the purpose. This can be dereferenced using the right variable name, and ultimate discarded.

For each token, there seems to be a parser class. Can't the Token class itself have a static Self Decode(stream, isAsync ...) method, so that it can decode itself from the stream and provide the token.

/// <summary>
/// Captures TDS Snapshot state
/// </summary>
internal class TdsSnapshotState
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I am not sure if we will ever need the snapshots in completely managed I/O.

// get the labels
ushort numLabels = await tdsStream.TdsReader.ReadUInt16Async(isAsync, ct).ConfigureAwait(false);

List<Label> labels = new List<Label>(numLabels);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LIst translate well to the ReadOnlyCollection on SqlDataReader's API which exposes classification information. This is OK to maintain as a list.

throw new InvalidOperationException($"Unsupported EnvChange Token type: {tokenSubType}");
}

return await subTypeParsers[tokenSubType](tdsStream, isAsync, ct).ConfigureAwait(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Ben, however if you must use a dictionary then use the TryGet method. This way you can avoid 2 lookups at 116 and 121.

/// <summary>
/// Environment change token sub type.
/// </summary>
internal enum EnvChangeTokenSubType : byte
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be called EnvChangeType for simplicity.

@saurabh500
Copy link
Contributor

@cheenamalhotra comments aside, I want to test it to parse out the Login response at the end of the login handler.

Can you provide the sample snippet that I can try out with the LoginHandler to try out this change ?

@cheenamalhotra
Copy link
Member Author

cheenamalhotra commented Aug 13, 2024

@cheenamalhotra comments aside, I want to test it to parse out the Login response at the end of the login handler.

Can you provide the sample snippet that I can try out with the LoginHandler to try out this change ?

@saurabh500 LoginChainTest now works with Password provided. Do remember all the server response is being read, but context may not be updated perfectly post reading server response. This is somewhere we can play with managing Context information better, if any rework is needed. As of now what I've added is proposal for organizing context information, but feel free to suggest any reorganization/renaming as needed.

P.S. I'm yet to update the PR based on feedback here, so please give me some time to update it today.

@cheenamalhotra cheenamalhotra changed the title Introducing Token types for TDS Token stream parsing Introducing type based Token parsing and processing Aug 13, 2024
@cheenamalhotra cheenamalhotra added the Area\ClientX Issues that are targeted for ClientX codebase. label Aug 13, 2024
Copy link
Contributor

@edwardneal edwardneal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting aside the discussion going on about I/O and performance, a quick sense check. In most cases, I've just checked the enum values and semantics against the TDS specification. It looks good to me though - looking forward to testing it!

Comment on lines +12 to +20
/// <summary>
/// Column data format.
/// </summary>
AltMetadata = 0x88,

/// <summary>
/// Row of data.
/// </summary>
AltRow = 0xD3,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know .NET currently supports these tokens, but to flag any pending work: neither of these tokens are supported by SQL Server 2012 or above, so probably don't need to port this functionality over. We might still want to include the value in TokenType for completeness' sake, I've not got any comment for that.

Comment on lines +97 to +100
/// <summary>
/// Offset.
/// </summary>
Offset = 0x78,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to my comment for AltMetadata and AltRow - this was removed in TDS 7.2.

Comment on lines +32 to +35
/// <summary>
/// Column information in browse mode.
/// </summary>
COLNAME = 0xa0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little odd: it's not part of the TDS specification, but I can see some references to this in FreeTDS. Apparently it may have been removed in TDS 4.2.

/// <summary>
/// Table name.
/// </summary>
TabName = 0xA4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
TabName = 0xA4
TabName = 0xA4,
/// <summary>
/// Table valued parameter row.
/// </summary>
TvpRow = 0x01

return allErrors;
}

private void AddErrorsToCollection(SqlErrorCollection inCollection, ref SqlErrorCollection collectionToAddTo, ref bool broken)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private void AddErrorsToCollection(SqlErrorCollection inCollection, ref SqlErrorCollection collectionToAddTo, ref bool broken)
private void AddErrorsToCollection(SqlErrorCollection inCollection, SqlErrorCollection collectionToAddTo, ref bool broken)

/// <summary>
/// Severity.
/// </summary>
public byte Severity { get; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TDS spec names the field as Class, but also refers to it as a severity. SqlError uses the Class property, so I'd align with that and add a comment noting that it's also called Severity.

/// </summary>
/// <param name="featureId">Feature Id.</param>
/// <param name="buffer">Data buffer.</param>
public FeatureExtAckToken(FeatureId featureId, ByteBuffer buffer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this approach might change as a result of the I/O discussion, but this has similar semantics to ENVCHANGE but has a different data structure/inheritance hierarchy. It might be worth considering whether ENVCHANGE is too heavy, or whether FEATUREEXTACK is too lightweight.

/// <summary>
/// Severity.
/// </summary>
public byte Severity { get; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment to ErrorToken.Severity.

/// <summary>
/// Program version.
/// </summary>
public ProgVersion ProgVersion { get; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need ProgVersion, or would a Version instance serve our purpose better?

Comment on lines +12 to +15
/// <summary>
/// Version 7.1
/// </summary>
V7_1 = 0x71000001,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flagging more TDS versions which might not be necessary in the enum.

ServerProcessId = Interlocked.Increment(ref s_spoofedServerProcessId);

// TODO enable parser registration with Parser introduction.
var connString = new SqlConnectionString(dataSource.ConnectionString);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use explicit types instead of var? Occurred in several places

@cheenamalhotra cheenamalhotra marked this pull request as draft August 21, 2024 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area\ClientX Issues that are targeted for ClientX codebase.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants